Add support for extracting ranges from percolator queries #19191

martijnvg · 2016-06-30T15:40:09Z

Also stop support percolator queries containing range queries with ranges based on now.

jpountz · 2016-06-30T16:52:58Z

docs/reference/mapping/types/percolator.asciidoc


 Because the `percolate` query is processing one document at a time, it doesn't support queries and filters that run
 against child documents such as `has_child` and `has_parent`.

+The percolator doesn't accepts percolator queries containing `range` queries with ranges that are based on current
+time. (using `now`).


maybe remove the dot after time?

…g percolator queries. Disallow percolator queries containing `range` queries that have ranges based on current time (`now`).

jpountz · 2016-07-05T16:26:32Z

I don't understand how the percolator can correlate the ranges of values with the name of the field. For instance if I index a query on a foo field and try to percolate a document that contains a bar field, how does it know it should not try ot use the value of bar in the approximation query since it is a different field name? Similarly how does it work if a query has ranges on multiple field names?

martijnvg · 2016-07-07T06:12:06Z

I don't understand how the percolator can correlate the ranges of values with the name of the field. For instance if I index a query on a foo field and try to percolate a document that contains a bar field, how does it know it should not try ot use the value of bar in the approximation query since it is a different field name?

So it doesn't know and in the case of your example it the approximation query will match and then we the MemoryIndex verifies that match it will not verify it and so it will never match.

The goal here is to do better for range queries than we do right now (in many cases marking the entire percolator query as failed). The should clause with range queries that we add to the approximation query will yield more false matches, then the should clause we add for evaluating the percolator queries that failed query extraction. So I think in general it will be better?

Similarly how does it work if a query has ranges on multiple field names?

If the document has at least one range that matches (not necessarily matching with the field) with what we recorded for the percolator query at index time the approximation query will match it. Depending if the range matches with the field and if it needs to match with the other ranges too (depending if ranges are part of a conjunction or disjunction) it will be become a match after it has been verified by the MemoryIndex.

So in the previous approach we did record with what field a range belongs to, so it was less likely to be a false positive. The big down side is that we didn't have control over the amount of fields the percolator field type created, because for each number type + range field name combination we would introduce a two new field (from and to fields.

With this approach we will always have 12 fields for range queries. In retrospect I'm still not happy with the facts that we add 12 fields (it is still a lot) (field per number type) and that the approximation query is less accurate. I don't feel there is rush to get this in, there must be a better approach the handle range queries.

jpountz · 2016-07-07T16:31:41Z

Thanks for the explanation, it makes sense.

With this approach we will always have 12 fields for range queries. In retrospect I'm still not happy with the facts that we add 12 fields (it is still a lot) (field per number type) and that the approximation query is less accurate.

I have mixed feelings about it too. On the one hand I don't want to have one field per query field in order to index range queries, but on the other hand, using a field per field type introduces fuzziness that makes me a bit worried, both because it could be a nest for bugs, and also because there could certainly be patterns of range queries and/or documents that defeat this optimization by making queries always returned by the approximation.

I think we should put it on hold for now.

martijnvg · 2016-07-07T19:11:03Z

@jpountz I do want to extract the now range validation from PercolatorFieldMapper and some of the refactoring here (mainly the move of ExtractQueryTermsService#extractQueryTerms to PercolatorFieldMapper) and put it into a new pr. Does that make sense?

jpountz · 2016-07-07T21:49:15Z

+1

martijnvg · 2016-07-08T11:01:41Z

Closing this PR, we should look into this optimization another time. Opened #19327 for the now range validation.

This is a follow up of #19191 for 7.x. This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this commit adds a way to opt-in to this new behavior in a minor version of 7.x. Relates #45735

This is a follow up of elastic#19191 for 7.x. This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this commit adds a way to opt-in to this new behavior in a minor version of 7.x. Relates elastic#45735 # Conflicts: # docs/reference/modules/cluster/allocation_awareness.asciidoc

martijnvg added >enhancement >breaking review :Search Relevance/Percolator Reverse search: find queries that match a document labels Jun 30, 2016

jpountz reviewed Jun 30, 2016
View reviewed changes

Add support for extracting ranges from range queries during indexin…

934f581

…g percolator queries. Disallow percolator queries containing `range` queries that have ranges based on current time (`now`).

martijnvg force-pushed the percolator_range_queries branch from 7e556fd to 934f581 Compare July 5, 2016 09:19

martijnvg added the v5.0.0-alpha5 label Jul 5, 2016

martijnvg changed the title ~~Add support for extracting ranges from range queries during indexing percolator queries~~ Add support for extracting ranges from percolator queries Jul 5, 2016

clintongormley removed the >enhancement label Jul 6, 2016

martijnvg removed review v5.0.0-alpha5 labels Jul 7, 2016

martijnvg closed this Jul 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for extracting ranges from percolator queries #19191

Add support for extracting ranges from percolator queries #19191

martijnvg commented Jun 30, 2016

jpountz Jun 30, 2016

jpountz commented Jul 5, 2016

martijnvg commented Jul 7, 2016

jpountz commented Jul 7, 2016

martijnvg commented Jul 7, 2016

jpountz commented Jul 7, 2016

martijnvg commented Jul 8, 2016

Add support for extracting ranges from percolator queries #19191

Add support for extracting ranges from percolator queries #19191

Conversation

martijnvg commented Jun 30, 2016

jpountz Jun 30, 2016

Choose a reason for hiding this comment

jpountz commented Jul 5, 2016

martijnvg commented Jul 7, 2016

jpountz commented Jul 7, 2016

martijnvg commented Jul 7, 2016

jpountz commented Jul 7, 2016

martijnvg commented Jul 8, 2016