Skip to content

Add support for extracting ranges from percolator queries #19191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

martijnvg
Copy link
Member

Also stop support percolator queries containing range queries with ranges based on now.

@martijnvg martijnvg added >enhancement >breaking review :Search Relevance/Percolator Reverse search: find queries that match a document labels Jun 30, 2016

Because the `percolate` query is processing one document at a time, it doesn't support queries and filters that run
against child documents such as `has_child` and `has_parent`.

The percolator doesn't accepts percolator queries containing `range` queries with ranges that are based on current
time. (using `now`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove the dot after time?

…g percolator queries.

Disallow percolator queries containing `range` queries that have ranges based on current time (`now`).
@martijnvg martijnvg force-pushed the percolator_range_queries branch from 7e556fd to 934f581 Compare July 5, 2016 09:19
@martijnvg martijnvg changed the title Add support for extracting ranges from range queries during indexing percolator queries Add support for extracting ranges from percolator queries Jul 5, 2016
@jpountz
Copy link
Contributor

jpountz commented Jul 5, 2016

I don't understand how the percolator can correlate the ranges of values with the name of the field. For instance if I index a query on a foo field and try to percolate a document that contains a bar field, how does it know it should not try ot use the value of bar in the approximation query since it is a different field name? Similarly how does it work if a query has ranges on multiple field names?

@martijnvg
Copy link
Member Author

I don't understand how the percolator can correlate the ranges of values with the name of the field. For instance if I index a query on a foo field and try to percolate a document that contains a bar field, how does it know it should not try ot use the value of bar in the approximation query since it is a different field name?

So it doesn't know and in the case of your example it the approximation query will match and then we the MemoryIndex verifies that match it will not verify it and so it will never match.

The goal here is to do better for range queries than we do right now (in many cases marking the entire percolator query as failed). The should clause with range queries that we add to the approximation query will yield more false matches, then the should clause we add for evaluating the percolator queries that failed query extraction. So I think in general it will be better?

Similarly how does it work if a query has ranges on multiple field names?

If the document has at least one range that matches (not necessarily matching with the field) with what we recorded for the percolator query at index time the approximation query will match it. Depending if the range matches with the field and if it needs to match with the other ranges too (depending if ranges are part of a conjunction or disjunction) it will be become a match after it has been verified by the MemoryIndex.

So in the previous approach we did record with what field a range belongs to, so it was less likely to be a false positive. The big down side is that we didn't have control over the amount of fields the percolator field type created, because for each number type + range field name combination we would introduce a two new field (from and to fields.

With this approach we will always have 12 fields for range queries. In retrospect I'm still not happy with the facts that we add 12 fields (it is still a lot) (field per number type) and that the approximation query is less accurate. I don't feel there is rush to get this in, there must be a better approach the handle range queries.

@jpountz
Copy link
Contributor

jpountz commented Jul 7, 2016

Thanks for the explanation, it makes sense.

With this approach we will always have 12 fields for range queries. In retrospect I'm still not happy with the facts that we add 12 fields (it is still a lot) (field per number type) and that the approximation query is less accurate.

I have mixed feelings about it too. On the one hand I don't want to have one field per query field in order to index range queries, but on the other hand, using a field per field type introduces fuzziness that makes me a bit worried, both because it could be a nest for bugs, and also because there could certainly be patterns of range queries and/or documents that defeat this optimization by making queries always returned by the approximation.

I think we should put it on hold for now.

@martijnvg
Copy link
Member Author

@jpountz I do want to extract the now range validation from PercolatorFieldMapper and some of the refactoring here (mainly the move of ExtractQueryTermsService#extractQueryTerms to PercolatorFieldMapper) and put it into a new pr. Does that make sense?

@jpountz
Copy link
Contributor

jpountz commented Jul 7, 2016

+1

@martijnvg
Copy link
Member Author

Closing this PR, we should look into this optimization another time. Opened #19327 for the now range validation.

@martijnvg martijnvg closed this Jul 8, 2016
jimczi added a commit that referenced this pull request Sep 6, 2019
This is a follow up of #19191 for 7.x.
This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will
effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this
commit adds a way to opt-in to this new behavior in a minor version of 7.x.

Relates #45735
snutu pushed a commit to snutu/elasticsearch that referenced this pull request Jul 1, 2021
This is a follow up of elastic#19191 for 7.x.
This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will
effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this
commit adds a way to opt-in to this new behavior in a minor version of 7.x.

Relates elastic#45735
# Conflicts:
#	docs/reference/modules/cluster/allocation_awareness.asciidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search Relevance/Percolator Reverse search: find queries that match a document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants