Skip to content

Add limits for ngram and shingle settings #27411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Nov 16, 2017

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
min_shingle_size in ShingleTokenFilter. Default is 3.

Log a warning when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings's value.

Relates to #27211
Closes #25887

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
 min_shingle_size in ShingleTokenFilter.  Default is 3.

Log a warning when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings's value.

Closes elastic#25887
@@ -27,6 +27,23 @@
- match: { detail.tokenizer.tokens.2.token: od }

---
"nGram_exception":
- skip:
version: " - 6.0.0"
Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi Jim, I am not sure about this 6.0.0 number. I assumed since we are on 6.0.1 now, I would skip up to 6.0.0.

Or should this rather be 6.0.99 considering that you label this PR as v6.1.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr targets 6.x, that's 6.1.0. We don't want to break the build so even if you plan to backport to 6.0.1 you need to do it in multiple steps. The first commit sets the expectation to the current version (so 6.0.99 here), otherwise the bwc tests would fail and you can change the expectations only after the backport to previous versions.
For this deprecation, I think it's ok to not backport to 6.0.1 and start only in 6.1.0.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mayya-sharipova
If the backport is simple you can directly cherry-pick and push. We usually just tag the pr on master with all the targeted versions.
Though this pr is against 6.x, that's 6.1.0 currently so the deprecation will start in this version, you need to change the rest tests.

"nGram_exception":
- skip:
version: " - 6.0.0"
reason: starting from version 6.0.1 this produces a warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation will start in 6.1.0 (that's the current 6.x).

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
min_shingle_size in ShingleTokenFilter.  Default is 3.

Log a warning when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings's value.

Closes elastic#25887
@mayya-sharipova mayya-sharipova merged commit 511baca into elastic:6.x Nov 17, 2017
@mayya-sharipova mayya-sharipova deleted the add-limits-ngram-shingle-settings6 branch November 17, 2017 18:22
jasontedor added a commit that referenced this pull request Nov 20, 2017
* 6.x: (41 commits)
  [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
  Transition transport apis to use void listeners (#27440)
  AwaitsFix GeoShapeQueryTests#testPointsOnly #27454
  Ensure nested documents have consistent version and seq_ids (#27455)
  Tests: Add Fedora-27 to packaging tests
  #26800: Fix docs rendering
  Move the CLI into its own subproject (#27114)
  Correct usage of "an" to "a" in getting started docs
  Avoid NPE when getting build information
  Remove manual tracking of registered channels (#27445)
  Standardize underscore requirements in parameters (#27414)
  Remove parameters on HandshakeResponseHandler (#27444)
  [GEO] fix pointsOnly bug for MULTIPOINT
  peanut butter hamburgers
  Uses TransportMasterNodeAction to update shard snapshot status (#27165)
  Log primary-replica resync failures
  Add limits for ngram and shingle settings (#27411)
  Enforce a minimum task execution and service time of 1 nanosecond
  Fix place-holder in allocation decider messages (#27436)
  Remove newline from log message (#27425)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants