[ML] fix random sampling background query consistency #83676

benwtrent · 2022-02-08T19:02:40Z

There was a consistency bug where the documents returned by the created scorer could change while looking at the same shard. This can occur if multiple weights are created from the same query.

For scenarios like Significant Terms/Text, we need a consistent view of each shard when using the same probability and seed.

This commit ensures this by creating a new random value supplier seeded by the shard hash & seed.

elasticmachine · 2022-02-08T19:02:43Z

Pinging @elastic/ml-core (Team:ML)

elasticmachine · 2022-02-08T19:03:15Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

tveasey

LGTM (the main thing is we still ensure independent samples per shard which seeding with hash arranges for us).

* upstream/master: (166 commits) Bind host all instead of just _site_ when needed (elastic#83145) [DOCS] Fix min/max agg snippets for histograms (elastic#83695) [DOCS] Add deprecation notice for system indices (elastic#83688) Cache ILM policy name on IndexMetadata (elastic#83603) [DOCS] Fix 8.0 breaking changes sort order (elastic#83685) [ML] fix random sampling background query consistency (elastic#83676) Move internal APIs into their own namespace '_internal' Runtime fields core-with-mapped tests support tsdb (elastic#83577) Optimize calculating the presence of a quorum (elastic#83638) Use switch expressions in EnableAllocationDecider and NodeShutdownAllocationDecider (elastic#83641) Note libffi error message in tmpdir docs (elastic#83662) Fix TransportDesiredNodesActionsIT batch tests (elastic#83659) [DOCS] Remove unused upgrade doc files (elastic#83617) [ML] Wait for model process to stop in stop deployment (elastic#83644) [ML] Fix submit after shutdown in process worker service (elastic#83645) Remove req/resp classes associated with HLRC (elastic#83599) Introduce index.version.compatibility setting (elastic#83264) Rename InternalTestCluster#getMasterNodeInstance (elastic#83407) Mute TimeSeriesIndexSearcherTests testCollectInOrderAcrossSegments (elastic#83648) Add rollover add max_primary_shard_docs condition (elastic#80981) ... # Conflicts: # x-pack/plugin/rollup/build.gradle # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java

[ML] fix random sampling background query consistency

153c9ad

benwtrent added >non-issue :ml Machine learning v8.2.0 labels Feb 8, 2022

elasticmachine added the Team:ML Meta label for the ML team label Feb 8, 2022

benwtrent added the :Analytics/Aggregations Aggregations label Feb 8, 2022

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 8, 2022

tveasey approved these changes Feb 8, 2022

View reviewed changes

benwtrent merged commit 7d1eb52 into elastic:master Feb 8, 2022

benwtrent deleted the feature/random-sampler-fix branch February 8, 2022 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] fix random sampling background query consistency #83676

[ML] fix random sampling background query consistency #83676

Uh oh!

benwtrent commented Feb 8, 2022

Uh oh!

elasticmachine commented Feb 8, 2022

Uh oh!

elasticmachine commented Feb 8, 2022

Uh oh!

tveasey left a comment

Uh oh!

Uh oh!

[ML] fix random sampling background query consistency #83676

[ML] fix random sampling background query consistency #83676

Uh oh!

Conversation

benwtrent commented Feb 8, 2022

Uh oh!

elasticmachine commented Feb 8, 2022

Uh oh!

elasticmachine commented Feb 8, 2022

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!