Skip to content

Disable concurrency when top_hits sorts on anything but _score #123610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 27, 2025

Conversation

javanna
Copy link
Member

@javanna javanna commented Feb 27, 2025

We already disable inter-segment concurrency in SearchSourceBuilder whenever the top-level sort provided is not _score. We shoudl apply the same rules in top_hits. We recenly stumbled upon non deterministic behaviour caused by script sorting defined within top hits. That is to be expected given that script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the mapping or in the search request, which does support concurrency and guarantees predictable behaviour.

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
@javanna javanna added >bug :Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged :Search Foundations/Search Catch all for Search Foundations v9.0.0 v8.18.1 v8.19.0 v9.1.0 v8.17.4 labels Feb 27, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Feb 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

{
SearchSourceBuilder searchSourceBuilder = newSearchSourceBuilder.get();
searchSourceBuilder.aggregation(new TermsAggregationBuilder("terms").subAggregation(new TopHitsAggregationBuilder("tophits")));
assertFalse(searchSourceBuilder.supportsParallelCollection(fieldCardinality));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test unrelated to the "sort" change made here? Do we generally disallow parallel collection for nested aggs or is there something specific here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I wanted to test the different combinations of top_hits within another agg as well as other agg with top_hits underneath.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@javanna javanna merged commit 9052226 into elastic:main Feb 27, 2025
16 of 17 checks passed
@javanna javanna deleted the fix/top_hits_sort_concurrency branch February 27, 2025 20:22
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 27, 2025
…ic#123610)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.0
8.18
8.x
8.17

javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 27, 2025
…ic#123610)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 27, 2025
…ic#123610)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2025
…) (#123642)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2025
…) (#123640)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2025
…) (#123643)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2025
…) (#123641)

We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 31, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit that referenced this pull request Apr 1, 2025
With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26012)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26013)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26011)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26014)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/Search Catch all for Search Foundations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.17.4 v8.18.1 v8.19.0 v9.0.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants