Skip to content

[ML] fix random sampling background query consistency #83676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 8, 2022

Conversation

benwtrent
Copy link
Member

There was a consistency bug where the documents returned by the created scorer could change while looking at the same shard. This can occur if multiple weights are created from the same query.

For scenarios like Significant Terms/Text, we need a consistent view of each shard when using the same probability and seed.

This commit ensures this by creating a new random value supplier seeded by the shard hash & seed.

@benwtrent benwtrent added >non-issue :ml Machine learning v8.2.0 labels Feb 8, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Feb 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent benwtrent added the :Analytics/Aggregations Aggregations label Feb 8, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (the main thing is we still ensure independent samples per shard which seeding with hash arranges for us).

@benwtrent benwtrent merged commit 7d1eb52 into elastic:master Feb 8, 2022
@benwtrent benwtrent deleted the feature/random-sampler-fix branch February 8, 2022 20:26
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 9, 2022
* upstream/master: (166 commits)
  Bind host all instead of just _site_ when needed (elastic#83145)
  [DOCS] Fix min/max agg snippets for histograms (elastic#83695)
  [DOCS] Add deprecation notice for system indices (elastic#83688)
  Cache ILM policy name on IndexMetadata (elastic#83603)
  [DOCS] Fix 8.0 breaking changes sort order (elastic#83685)
  [ML] fix random sampling background query consistency (elastic#83676)
  Move internal APIs into their own namespace '_internal'
  Runtime fields core-with-mapped tests support tsdb (elastic#83577)
  Optimize calculating the presence of a quorum (elastic#83638)
  Use switch expressions in EnableAllocationDecider and NodeShutdownAllocationDecider (elastic#83641)
  Note libffi error message in tmpdir docs (elastic#83662)
  Fix TransportDesiredNodesActionsIT batch tests (elastic#83659)
  [DOCS] Remove unused upgrade doc files (elastic#83617)
  [ML] Wait for model process to stop in stop deployment (elastic#83644)
  [ML] Fix submit after shutdown in process worker service (elastic#83645)
  Remove req/resp classes associated with HLRC (elastic#83599)
  Introduce index.version.compatibility setting (elastic#83264)
  Rename InternalTestCluster#getMasterNodeInstance (elastic#83407)
  Mute TimeSeriesIndexSearcherTests testCollectInOrderAcrossSegments (elastic#83648)
  Add rollover add max_primary_shard_docs condition (elastic#80981)
  ...

# Conflicts:
#	x-pack/plugin/rollup/build.gradle
#	x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations :ml Machine learning >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:ML Meta label for the ML team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants