Skip to content

Re-enable parallel collection for field sorted top hits #125916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 31, 2025

With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.

With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
@javanna javanna added >bug auto-backport Automatically create backport pull requests when merged :Search Foundations/Search Catch all for Search Foundations v8.18.1 v8.19.0 v9.0.1 v9.1.0 v8.17.5 labels Mar 31, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Mar 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna
Copy link
Member Author

javanna commented Mar 31, 2025

Here is a picture of the regression that this PR addresses:

Screenshot From 2025-03-31 09-32-43

Copy link
Contributor

@drempapis drempapis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The code path clearly reverses the functionality. Is it worth having an external configuration, e.g., a YAML file, to control concurrency, or is it a single case and a workaround to investigate a bug?

@javanna
Copy link
Member Author

javanna commented Apr 1, 2025

The code path clearly reverses the functionality. Is it worth having an external configuration, e.g., a YAML file, to control concurrency, or is it a single case and a workaround to investigate a bug?

@drempapis I don't understand your questions, sorry, could you clarify what functionality is reversed and how external config would help controlling the concurrency behavior? I think the key point here is that sort by field does support search concurrency, but it is in the average case not faster than sequential search across the segments, with the exception of top_hits, as highlighted by benchmark results.

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0
8.17

javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26012)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26013)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26011)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26014)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
@drempapis
Copy link
Contributor

@javanna, sorry not to be clear, I had in mind the pr #101535, which set the supportsParallelCollection to false for all subclasses, and the pr #123610, which introduced an override for supportsParallelCollection for class TopHitsAggregationBuilder. By the word reverse, I had in mind that in the current pr, the supportsParallelCollection in the SortBuilder now defaults to true.

The other question was whether it is worth having an external config to change the boolean value of supportsParallelCollection for the SortBuilder subclasses.

@javanna
Copy link
Member Author

javanna commented Apr 3, 2025

Regression is fixed:

Screenshot From 2025-04-03 11-33-32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.17.5 v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants