Skip to content

[RollupV2] Implement search resolution #67783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 68 commits into from

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Jan 20, 2021

This PR implements searching in data streams that contain rollup indices.

The initial release of rollups (aka rollups v1) could only be queried through a separate REST endpoint (_rollup_search). (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html)
For the new rollups implementation (aka rollups v2) querying rollup data is performed through the default _search endpoint.
This change greatly simplifies rollup queries, as rollup indices are treated in the same way as live indices.

Another major difference between rollups v1 and v2 is how results are merged, when querying a live index and its rollup indices in the same request.

In rollup v1, when querying live and rollup indices, if there is any overlap in buckets between the two responses,
only the buckets from the non-rollup index are used. (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html#_searching_both_historical_rollup_and_non_rollup_data)

In rollups v2, the same functionality is only supported when querying a data stream that contains rollup indices.
Nevertheless, when querying concrete indices (and not a data stream) all results from all results will be returned.

PR is based on the work done in #64970

Closes #48005
Relates to #42720

@csoulios csoulios added WIP :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v8.0.0 labels Jan 20, 2021
@csoulios csoulios requested a review from talevy January 20, 2021 17:32
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios marked this pull request as draft January 20, 2021 17:32
Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good! just left a few comments, but I think there is a lot of coverage here already

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level design to build the final list of shards in the can match phase looks good to me. However I think we need to discuss how the search request is checked to determine if a rollup shard can match or not. In my opinion the current design cannot hold, we should propagate the logic to the requests and aggregation builder so that they can decide independently. It could be QueryBuilder#validate and AggregationBuilder#validate or something similar but I think it's important that we decide on a path forward early. Otherwise we'll end up like rollup v1 and a monolithic logic in a separate full of ifs ;).

return false;
}

for (AggregationBuilder builder : aggregations.getAggregatorFactories()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be recursive: builder.getSubAggregations().

}

for (AggregationBuilder builder : aggregations.getAggregatorFactories()) {
if (SUPPORTED_AGGS.contains(builder.getWriteableName()) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to check date histogram that appears as children. It would be nice if the logic could be done in the aggregation directly though. It should be the responsibility of the aggregation to validate the field usage rather than this high level check. That's fragile and error-prone, we need something that is sustainable for the long term.

csoulios added 5 commits March 1, 2021 14:55
are only accessed at the shard level. Co-ordinator will decide on
index priority based on info stored in the CanMaptchPhaseResponse
to include an array of CanMatchResponse objects.
@csoulios csoulios changed the title Rollup v2 search resolution [RollupV2] Implement search resolution Mar 3, 2021
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @csoulios , I left more comments. I think the rollup decider in the mapping and queries can be implemented in follow ups. What we're after here is a resolution at the coordinating level and it's getting close.

IndexMetadata requestIndexMetadata = clusterService.state().getMetadata()
.index(request.shardId().getIndexName());

CanMatchResponse rollupCanMatchResponse = RollupShardDecider.canMatch(request, context, requestIndexMetadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that would be easier to not copy the RollupShardDecider from v1. That's not how we want to implement the check so no need to copy tests and stuffs since we'll need to rewrite them anyway ?
We can start simple and only check if the original index pattern (ShardSearchRequest#originalIndices) contains the datastream of the rollup index. Then we can incrementally add the verification inside the mapping, queries and agg, we don't need to get it right on the first PR.

@@ -710,9 +714,27 @@ static boolean shouldPreFilterSearchShards(ClusterState clusterState,
} else if (preFilterShardSize == null) {
preFilterShardSize = SearchRequest.DEFAULT_PRE_FILTER_SHARD_SIZE;
}
if (RollupV2.isEnabled() && hasRollupDatastream(indices, searchRequest.indices(), clusterState)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that we always run the can match phase if a datastream is present. No need to check the existence of rollups, the can match phase was made to handle datastreams before they existed ;).

&& preFilterShardSize < numShards;
}

private static boolean hasRollupDatastream(String[] indices, String[] requestIndices, ClusterState clusterState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the logic that we need imo.
We want to know if a datastream is explicitly requested so it should useSearchRequest#indices to match datastreams ?

@@ -133,6 +133,67 @@ public void run() throws IOException {
}
}

public void testFilterWithRollup() throws InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is tested here ? I don't see where the logic to replace an index is tested ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was accidentally left out. All tests have been implemented in RollupSearchIT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.0.0-rc1 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate rollup search with _search endpoint
5 participants