[RollupV2] Implement search resolution #67783

csoulios · 2021-01-20T17:32:08Z

This PR implements searching in data streams that contain rollup indices.

The initial release of rollups (aka rollups v1) could only be queried through a separate REST endpoint (_rollup_search). (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html)
For the new rollups implementation (aka rollups v2) querying rollup data is performed through the default _search endpoint.
This change greatly simplifies rollup queries, as rollup indices are treated in the same way as live indices.

Another major difference between rollups v1 and v2 is how results are merged, when querying a live index and its rollup indices in the same request.

In rollup v1, when querying live and rollup indices, if there is any overlap in buckets between the two responses,
only the buckets from the non-rollup index are used. (https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-search.html#_searching_both_historical_rollup_and_non_rollup_data)

In rollups v2, the same functionality is only supported when querying a data stream that contains rollup indices.
Nevertheless, when querying concrete indices (and not a data stream) all results from all results will be returned.

PR is based on the work done in #64970

Closes #48005
Relates to #42720

this commit introduces initial search-time indices resolution POC for the rollup indices and how it interacts with DataStream. needs to be put behind feature-flag for rollups (TODO) relates elastic#42720.

elasticmachine · 2021-01-20T17:32:11Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

x-pack/plugin/rollup/qa/rest/src/javaRestTest/java/org/elasticsearch/rollup/RollupSearchIT.java

talevy

Overall looking good! just left a few comments, but I think there is a lot of coverage here already

server/src/main/java/org/elasticsearch/cluster/metadata/RollupGroup.java

server/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java

server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java

server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java

server/src/test/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhaseTests.java

jimczi

The high level design to build the final list of shards in the can match phase looks good to me. However I think we need to discuss how the search request is checked to determine if a rollup shard can match or not. In my opinion the current design cannot hold, we should propagate the logic to the requests and aggregation builder so that they can decide independently. It could be QueryBuilder#validate and AggregationBuilder#validate or something similar but I think it's important that we decide on a path forward early. Otherwise we'll end up like rollup v1 and a monolithic logic in a separate full of ifs ;).

server/src/main/java/org/elasticsearch/cluster/metadata/RollupIndexMetadata.java

jimczi · 2021-02-18T13:22:40Z

server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java

+            return false;
+        }
+
+        for (AggregationBuilder builder : aggregations.getAggregatorFactories()) {


This needs to be recursive: builder.getSubAggregations().

jimczi · 2021-02-18T13:27:27Z

server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java

+        }
+
+        for (AggregationBuilder builder : aggregations.getAggregatorFactories()) {
+            if (SUPPORTED_AGGS.contains(builder.getWriteableName()) == false) {


We have to check date histogram that appears as children. It would be nice if the logic could be done in the aggregation directly though. It should be the responsibility of the aggregation to validate the field usage rather than this high level check. That's fragile and error-prone, we need something that is sustainable for the long term.

server/src/main/java/org/elasticsearch/rollup/RollupShardDecider.java

are only accessed at the shard level. Co-ordinator will decide on index priority based on info stored in the CanMaptchPhaseResponse

to include an array of CanMatchResponse objects.

(amended last commit)

jimczi

Thanks for iterating @csoulios , I left more comments. I think the rollup decider in the mapping and queries can be implemented in follow ups. What we're after here is a resolution at the coordinating level and it's getting close.

server/src/main/java/org/elasticsearch/search/SearchService.java

jimczi · 2021-03-04T08:29:37Z

server/src/main/java/org/elasticsearch/search/SearchService.java

+                        IndexMetadata requestIndexMetadata = clusterService.state().getMetadata()
+                                .index(request.shardId().getIndexName());
+
+                        CanMatchResponse rollupCanMatchResponse = RollupShardDecider.canMatch(request, context, requestIndexMetadata,


I wonder if that would be easier to not copy the RollupShardDecider from v1. That's not how we want to implement the check so no need to copy tests and stuffs since we'll need to rewrite them anyway ?
We can start simple and only check if the original index pattern (ShardSearchRequest#originalIndices) contains the datastream of the rollup index. Then we can incrementally add the verification inside the mapping, queries and agg, we don't need to get it right on the first PR.

server/src/main/java/org/elasticsearch/cluster/metadata/RollupIndexMetadata.java

jimczi · 2021-03-04T08:37:50Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

@@ -710,9 +714,27 @@ static boolean shouldPreFilterSearchShards(ClusterState clusterState,
        } else if (preFilterShardSize == null) {
            preFilterShardSize = SearchRequest.DEFAULT_PRE_FILTER_SHARD_SIZE;
        }
+        if (RollupV2.isEnabled() && hasRollupDatastream(indices, searchRequest.indices(), clusterState)) {


I'd prefer that we always run the can match phase if a datastream is present. No need to check the existence of rollups, the can match phase was made to handle datastreams before they existed ;).

jimczi · 2021-03-04T08:42:03Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

+            && preFilterShardSize < numShards;
+    }
+
+    private static boolean hasRollupDatastream(String[] indices, String[] requestIndices, ClusterState clusterState) {


That's not the logic that we need imo.
We want to know if a datastream is explicitly requested so it should useSearchRequest#indices to match datastreams ?

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java

jimczi · 2021-03-04T08:53:43Z

server/src/test/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhaseTests.java

@@ -133,6 +133,67 @@ public void run() throws IOException {
        }
    }

+    public void testFilterWithRollup() throws InterruptedException {


I am not sure what is tested here ? I don't see where the logic to replace an index is tested ?

This test was accidentally left out. All tests have been implemented in RollupSearchIT

Fields information is included in the field metadata

talevy and others added 16 commits November 11, 2020 16:21

Rollup V2 Search Resolution Setup

b768fbf

this commit introduces initial search-time indices resolution POC for the rollup indices and how it interacts with DataStream. needs to be put behind feature-flag for rollups (TODO) relates elastic#42720.

Merge remote-tracking branch 'elastic/master' into rollupv2search

793b807

fix compile errors

8ececef

Merge branch 'master' into rollupv2-search

c099033

Merge branch 'master' into rollupv2-search

0d88290

Merged with master

7824cef

Merge branch 'master' into rollupv2-search

49d321e

Merge branch 'master' into rollupv2-search

0ac4438

Merge branch 'master' into rollupv2-search

c74629d

Merge branch 'master' into rollupv2-search

8f73c19

Fix build errors after merge with master

c889bb7

WIP

7489dd7

Merge branch 'master' into rollupv2-search

d1ad022

Remove preFilterRollup param

bf82de1

Merge branch 'master' into rollupv2-search

7512370

WIP

54d78b2

csoulios added WIP :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v8.0.0 labels Jan 20, 2021

csoulios requested a review from talevy January 20, 2021 17:32

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 20, 2021

csoulios marked this pull request as draft January 20, 2021 17:32

talevy and others added 4 commits January 20, 2021 15:52

resolve merge

4e2ba19

add simple javaRestTest for rollup search

a21d676

Merge remote-tracking branch 'elastic/master' into rollupv2-search

54c51c1

Changed license to test file

c8fda40

talevy reviewed Jan 21, 2021

View reviewed changes

x-pack/plugin/rollup/qa/rest/src/javaRestTest/java/org/elasticsearch/rollup/RollupSearchIT.java Outdated Show resolved Hide resolved

Merge branch 'master' into rollupv2-search

5de29cb

csoulios mentioned this pull request Jan 26, 2021

Refactor rollups meta (AKA Rollup V2) #42720

Closed

21 tasks

cleanup

f688baf

talevy reviewed Feb 16, 2021

View reviewed changes

csoulios added 7 commits February 17, 2021 11:53

Removed method that was not used in AggregatorFactories

28b7035

Deleted RollupMetadata and RollupGroup classes

e4f85d6

Addressed reviewer comments

712ca29

Added some unit tests for RollupShardDecider (more to follow)

cc2bb5c

checkstyle

53dd12a

Added more unit tests for RollupShardDecider

9d5cefd

Merge branch 'master' into rollupv2-search

69fe706

jimczi reviewed Feb 18, 2021

View reviewed changes

csoulios added 5 commits March 1, 2021 14:55

Merge branch 'master' into rollupv2-search

381fbed

Minor change

50b0b89

Modify can_match phase so that index metadata

ad1c045

are only accessed at the shard level. Co-ordinator will decide on index priority based on info stored in the CanMaptchPhaseResponse

Modified CanMatchSearchPhaseResults class

14a537d

to include an array of CanMatchResponse objects.

Added some unit tests for RollupShardDecider (more to follow)

b2aff7e

(amended last commit)

csoulios changed the title ~~Rollup v2 search resolution~~ [RollupV2] Implement search resolution Mar 3, 2021

jimczi requested changes Mar 4, 2021

View reviewed changes

csoulios added 10 commits March 10, 2021 16:14

Merge branch 'master' into rollupv2-search

697de67

Merge branch 'master' into rollupv2-search

7e7b3b4

Merge branch 'master' into rollupv2-search

3d832dc

Removed RollupIndexMetadata class

ab5a1a9

Merge branch 'master' into rollupv2-search

2ebbffa

Use IndexMetadata settings for determing rollup index

a58df7a

Replaced index name with index uuid

1b29db2

Fields information is included in the field metadata

Merge branch 'master' into rollupv2-search

ea4b5b6

Fix typo

879a0fe

Fix broken test: CanMatchPreFilterSearchPhaseTests.testSortShards()

56c10be

csoulios closed this Nov 11, 2021

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

[RollupV2] Implement search resolution #67783

[RollupV2] Implement search resolution #67783

Uh oh!

Conversation

csoulios commented Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jan 20, 2021

Uh oh!

Uh oh!

talevy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimczi Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

jimczi Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jimczi Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimczi Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

jimczi Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimczi Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

csoulios Mar 22, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

csoulios commented Jan 20, 2021 •

edited

Loading