-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[RollupV2] Implement search resolution #67783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this commit introduces initial search-time indices resolution POC for the rollup indices and how it interacts with DataStream. needs to be put behind feature-flag for rollups (TODO) relates elastic#42720.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
x-pack/plugin/rollup/qa/rest/src/javaRestTest/java/org/elasticsearch/rollup/RollupSearchIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good! just left a few comments, but I think there is a lot of coverage here already
server/src/main/java/org/elasticsearch/cluster/metadata/RollupGroup.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhaseTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high level design to build the final list of shards in the can match phase looks good to me. However I think we need to discuss how the search request is checked to determine if a rollup shard can match or not. In my opinion the current design cannot hold, we should propagate the logic to the requests and aggregation builder so that they can decide independently. It could be QueryBuilder#validate
and AggregationBuilder#validate
or something similar but I think it's important that we decide on a path forward early. Otherwise we'll end up like rollup v1 and a monolithic logic in a separate full of ifs ;).
server/src/main/java/org/elasticsearch/cluster/metadata/RollupIndexMetadata.java
Outdated
Show resolved
Hide resolved
return false; | ||
} | ||
|
||
for (AggregationBuilder builder : aggregations.getAggregatorFactories()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be recursive: builder.getSubAggregations()
.
} | ||
|
||
for (AggregationBuilder builder : aggregations.getAggregatorFactories()) { | ||
if (SUPPORTED_AGGS.contains(builder.getWriteableName()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to check date histogram that appears as children. It would be nice if the logic could be done in the aggregation directly though. It should be the responsibility of the aggregation to validate the field usage rather than this high level check. That's fragile and error-prone, we need something that is sustainable for the long term.
server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java
Outdated
Show resolved
Hide resolved
are only accessed at the shard level. Co-ordinator will decide on index priority based on info stored in the CanMaptchPhaseResponse
to include an array of CanMatchResponse objects.
(amended last commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @csoulios , I left more comments. I think the rollup decider in the mapping and queries can be implemented in follow ups. What we're after here is a resolution at the coordinating level and it's getting close.
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
IndexMetadata requestIndexMetadata = clusterService.state().getMetadata() | ||
.index(request.shardId().getIndexName()); | ||
|
||
CanMatchResponse rollupCanMatchResponse = RollupShardDecider.canMatch(request, context, requestIndexMetadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that would be easier to not copy the RollupShardDecider
from v1. That's not how we want to implement the check so no need to copy tests and stuffs since we'll need to rewrite them anyway ?
We can start simple and only check if the original index pattern (ShardSearchRequest#originalIndices
) contains the datastream of the rollup index. Then we can incrementally add the verification inside the mapping, queries and agg, we don't need to get it right on the first PR.
server/src/main/java/org/elasticsearch/cluster/metadata/RollupIndexMetadata.java
Outdated
Show resolved
Hide resolved
@@ -710,9 +714,27 @@ static boolean shouldPreFilterSearchShards(ClusterState clusterState, | |||
} else if (preFilterShardSize == null) { | |||
preFilterShardSize = SearchRequest.DEFAULT_PRE_FILTER_SHARD_SIZE; | |||
} | |||
if (RollupV2.isEnabled() && hasRollupDatastream(indices, searchRequest.indices(), clusterState)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that we always run the can match phase if a datastream is present. No need to check the existence of rollups, the can match phase was made to handle datastreams before they existed ;).
&& preFilterShardSize < numShards; | ||
} | ||
|
||
private static boolean hasRollupDatastream(String[] indices, String[] requestIndices, ClusterState clusterState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the logic that we need imo.
We want to know if a datastream is explicitly requested so it should useSearchRequest#indices
to match datastreams ?
x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
Outdated
Show resolved
Hide resolved
@@ -133,6 +133,67 @@ public void run() throws IOException { | |||
} | |||
} | |||
|
|||
public void testFilterWithRollup() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is tested here ? I don't see where the logic to replace an index is tested ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was accidentally left out. All tests have been implemented in RollupSearchIT
Fields information is included in the field metadata
This PR implements searching in data streams that contain rollup indices.
The initial release of rollups (aka rollups v1) could only be queried through a separate REST endpoint (
_rollup_search
). (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html)For the new rollups implementation (aka rollups v2) querying rollup data is performed through the default
_search
endpoint.This change greatly simplifies rollup queries, as rollup indices are treated in the same way as live indices.
Another major difference between rollups v1 and v2 is how results are merged, when querying a live index and its rollup indices in the same request.
In rollup v1, when querying live and rollup indices, if there is any overlap in buckets between the two responses,
only the buckets from the non-rollup index are used. (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html#_searching_both_historical_rollup_and_non_rollup_data)
In rollups v2, the same functionality is only supported when querying a data stream that contains rollup indices.
Nevertheless, when querying concrete indices (and not a data stream) all results from all results will be returned.
PR is based on the work done in #64970
Closes #48005
Relates to #42720