Skip to content

Add the limit on the number of expanded terms #27796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mayya-sharipova opened this issue Dec 13, 2017 · 3 comments
Closed

Add the limit on the number of expanded terms #27796

mayya-sharipova opened this issue Dec 13, 2017 · 3 comments
Assignees
Labels
:Search/Search Search-related issues that do not fall into other categories

Comments

@mayya-sharipova
Copy link
Contributor

e.g wildcards, fuzzy queries

Relates to #11511

@mayya-sharipova mayya-sharipova added :Search/Search Search-related issues that do not fall into other categories discuss labels Dec 13, 2017
@mayya-sharipova mayya-sharipova self-assigned this Dec 13, 2017
@mayya-sharipova
Copy link
Contributor Author

This was one of the items in the Elasticsearch roadmap.

But looks like we may not need an extra limit, as fuzzy queries and wild card queries use one of the multi term query rewrite methods that already safeguard against too much expansion. For example:

  • constant_score - tries disjunction of boolean clauses; and if there are too many of them builds a bitset of matching docs
  • scoring_boolean - will hit too many clauses failure if it exceeds the boolean query limit (1024), and produce the following error:
"error":{  
   "root_cause":[  
      {  
         "type":"too_many_clauses",
         "reason":"maxClauseCount is set to 1024"
      }
   ]
  • constant_score_boolean - similar to scoring_boolean will hit too many clauses failure if it exceeds the boolean query limit (1024).
  • top_terms_N - the N controls the size of the top scoring terms to use.
  • top_terms_boost_N - the N controls the size of the top scoring terms to use.
  • top_terms_blended_freqs_N - the N controls the size of the top scoring terms to use.

@mayya-sharipova
Copy link
Contributor Author

As discussed in the FixitFriday, we still will have a limit on the number of expanded terms equal to 1024, as all other Lucene checks are done later.

@mayya-sharipova
Copy link
Contributor Author

After further investigation, looks like this feature is NOT necessary, as there are already safeguards on the Lucene level.

For example, in the case of the wildquery, the algorithm is following:

  1. Collect all the terms for the field
  2. Compute the intersection of the WildQuery automaton with the field terms
  3. Rewrite query:
    • constant_score -> MultiTermQueryConstantScoreWrapper attempts to rewrite as a boolean query, but changes to the bitset of docs, if too many terms
    • scoring_boolean -> ScoringRewrite:rewrite before adding boolean clauses tries to collect terms in TermCollector; fails as soon as No. of terms exceeds 1024
    • constant_score_boolean -> ScoringRewrite, similar to scoring_boolean
    • top_terms_N and other top terms methods -> TopTermsRewrite uses minimum between N and 1024 for rewriting the query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

1 participant