Skip to content

Commit 7ceed13

Browse files
authored
Speed up date_histogram without children (#63643)
This speeds up `date_histogram` aggregations without a parent or children. This is quite common - it's the aggregation that Kibana's Discover uses all over the place. Also, we hope to be able to use the same mechanism to speed aggs with children one day, but that day isn't today. The kind of speedup we're seeing is fairly substantial in many cases: ``` | | | before | after | | | 90th percentile service time | date_histogram_calendar_interval | 9266.07 | 1376.13 | ms | | 90th percentile service time | date_histogram_calendar_interval_with_tz | 9217.21 | 1372.67 | ms | | 90th percentile service time | date_histogram_fixed_interval | 8817.36 | 1312.67 | ms | | 90th percentile service time | date_histogram_fixed_interval_with_tz | 8801.71 | 1311.69 | ms | <-- discover's agg | 90th percentile service time | date_histogram_fixed_interval_with_metrics | 44660.2 | 43789.5 | ms | ``` This uses the work we did in #61467 to precompute the rounding points for a `date_histogram`. Now, when we know the rounding points we execute the `date_histogram` as a `range` aggregation. This is nice for two reasons: 1. We can further rewrite the `range` aggregation (see below) 2. We don't need to allocate a hash to convert rounding points to ordinals. 3. We can send precise cardinality estimates to sub-aggs. Points 2 and 3 above are nice, but most of the speed difference comes from point 1. Specifically, we now look into executing `range` aggregations as a `filters` aggregation. Normally the `filters` aggregation is quite slow but when it doesn't have a parent or any children then we can execute it "filter by filter" which is significantly faster. So fast, in fact, that it is faster than the original `date_histogram`. The `range` aggregation is *fairly* careful in how it rewrites, giving up on the `filters` aggregation if it won't collect "filter by filter" and falling back to its original execution mechanism. So an aggregation like this: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "date_histogram": { "field": "dropoff_datetime", "fixed_interval": "60d", "time_zone": "America/New_York" } } } } ``` is executed like: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "range": { "field": "dropoff_datetime", "ranges": [ {"from": 1415250000000, "to": 1420434000000}, {"from": 1420434000000, "to": 1425618000000}, {"from": 1425618000000, "to": 1430798400000}, {"from": 1430798400000, "to": 1435982400000}, {"from": 1435982400000, "to": 1441166400000}, {"from": 1441166400000, "to": 1446350400000}, {"from": 1446350400000, "to": 1451538000000}, {"from": 1451538000000} ] } } } } ``` Which in turn is executed like this: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "filters": { "filters": { "1": {"range": {"dropoff_datetime": {"gte": "2014-12-30 00:00:00", "lt": "2015-01-05 05:00:00"}}}, "2": {"range": {"dropoff_datetime": {"gte": "2015-01-05 05:00:00", "lt": "2015-03-06 05:00:00"}}}, "3": {"range": {"dropoff_datetime": {"gte": "2015-03-06 00:00:00", "lt": "2015-05-05 00:00:00"}}}, "4": {"range": {"dropoff_datetime": {"gte": "2015-05-05 00:00:00", "lt": "2015-07-04 00:00:00"}}}, "5": {"range": {"dropoff_datetime": {"gte": "2015-07-04 00:00:00", "lt": "2015-09-02 00:00:00"}}}, "6": {"range": {"dropoff_datetime": {"gte": "2015-09-02 00:00:00", "lt": "2015-11-01 00:00:00"}}}, "7": {"range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00", "lt": "2015-12-31 00:00:00"}}}, "8": {"range": {"dropoff_datetime": {"gte": "2015-12-31 00:00:00"}}} } } } } } ``` And *that* is faster because we can execute it "filter by filter". Finally, notice the `range` query filtering the data. That is required for the data set that I'm using for testing. The "filter by filter" collection mechanism for the `filters` agg needs special case handling when the query is a `range` query and the filter is a `range` query and they are both on the same field. That special case handling "merges" the range query. Without it "filter by filter" collection is substantially slower. Its still quite a bit quicker than the standard `filter` collection, but not nearly as fast as it could be.
1 parent b31a8ff commit 7ceed13

File tree

36 files changed

+2271
-227
lines changed

36 files changed

+2271
-227
lines changed

rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/10_histogram.yml

+58-3
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,58 @@ setup:
495495
date:
496496
type: date
497497

498+
- do:
499+
bulk:
500+
index: test_2
501+
refresh: true
502+
body:
503+
- '{"index": {}}'
504+
- '{"date": "2000-01-01"}' # This date is intenationally very far in the past so we end up not being able to use the date_histo -> range -> filters optimization
505+
- '{"index": {}}'
506+
- '{"date": "2000-01-02"}'
507+
- '{"index": {}}'
508+
- '{"date": "2016-02-01"}'
509+
- '{"index": {}}'
510+
- '{"date": "2016-03-01"}'
511+
512+
- do:
513+
search:
514+
index: test_2
515+
body:
516+
size: 0
517+
profile: true
518+
aggs:
519+
histo:
520+
date_histogram:
521+
field: date
522+
calendar_interval: month
523+
- match: { hits.total.value: 4 }
524+
- length: { aggregations.histo.buckets: 195 }
525+
- match: { aggregations.histo.buckets.0.key_as_string: "2000-01-01T00:00:00.000Z" }
526+
- match: { aggregations.histo.buckets.0.doc_count: 2 }
527+
- match: { profile.shards.0.aggregations.0.type: DateHistogramAggregator }
528+
- match: { profile.shards.0.aggregations.0.description: histo }
529+
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 4 }
530+
- match: { profile.shards.0.aggregations.0.debug.total_buckets: 3 }
531+
532+
---
533+
"date_histogram run as filters profiler":
534+
- skip:
535+
version: " - 7.99.99"
536+
reason: optimization added in 7.11.0, backport pending
537+
538+
- do:
539+
indices.create:
540+
index: test_2
541+
body:
542+
settings:
543+
number_of_replicas: 0
544+
number_of_shards: 1
545+
mappings:
546+
properties:
547+
date:
548+
type: date
549+
498550
- do:
499551
bulk:
500552
index: test_2
@@ -524,10 +576,13 @@ setup:
524576
- length: { aggregations.histo.buckets: 3 }
525577
- match: { aggregations.histo.buckets.0.key_as_string: "2016-01-01T00:00:00.000Z" }
526578
- match: { aggregations.histo.buckets.0.doc_count: 2 }
527-
- match: { profile.shards.0.aggregations.0.type: DateHistogramAggregator }
579+
- match: { profile.shards.0.aggregations.0.type: DateHistogramAggregator.FromDateRange }
528580
- match: { profile.shards.0.aggregations.0.description: histo }
529-
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 4 }
530-
- match: { profile.shards.0.aggregations.0.debug.total_buckets: 3 }
581+
# ultimately this ends up as a filters agg that uses filter by filter collection which is tracked in build_leaf_collector
582+
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 0 }
583+
- match: { profile.shards.0.aggregations.0.debug.delegate: RangeAggregator.FromFilters }
584+
- match: { profile.shards.0.aggregations.0.debug.delegate_debug.delegate: FiltersAggregator.FilterByFilter }
585+
- match: { profile.shards.0.aggregations.0.debug.delegate_debug.delegate_debug.segments_with_deleted_docs: 0 }
531586

532587
---
533588
"histogram with hard bounds":

server/src/internalClusterTest/java/org/elasticsearch/action/search/TransportSearchIT.java

+5
Original file line numberDiff line numberDiff line change
@@ -590,5 +590,10 @@ public ScoreMode scoreMode() {
590590

591591
@Override
592592
public void preCollection() throws IOException {}
593+
594+
@Override
595+
public Aggregator[] subAggregators() {
596+
throw new UnsupportedOperationException();
597+
}
593598
}
594599
}

server/src/internalClusterTest/java/org/elasticsearch/search/aggregations/bucket/DateHistogramIT.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@
3838
import org.elasticsearch.search.aggregations.BucketOrder;
3939
import org.elasticsearch.search.aggregations.InternalAggregation;
4040
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
41-
import org.elasticsearch.search.aggregations.bucket.histogram.LongBounds;
4241
import org.elasticsearch.search.aggregations.bucket.histogram.Histogram;
4342
import org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket;
4443
import org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram;
44+
import org.elasticsearch.search.aggregations.bucket.histogram.LongBounds;
4545
import org.elasticsearch.search.aggregations.metrics.Avg;
4646
import org.elasticsearch.search.aggregations.metrics.Sum;
4747
import org.elasticsearch.test.ESIntegTestCase;

server/src/main/java/org/elasticsearch/common/Rounding.java

+23
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,13 @@ public interface Prepared {
291291
* next rounded value in specified units if possible.
292292
*/
293293
double roundingSize(long utcMillis, DateTimeUnit timeUnit);
294+
/**
295+
* If this rounding mechanism precalculates rounding points then
296+
* this array stores dates such that each date between each entry.
297+
* if the rounding mechanism doesn't precalculate points then this
298+
* is {@code null}.
299+
*/
300+
long[] fixedRoundingPoints();
294301
}
295302
/**
296303
* Prepare to round many times.
@@ -435,6 +442,11 @@ protected Prepared maybeUseArray(long minUtcMillis, long maxUtcMillis, int max)
435442
}
436443
return new ArrayRounding(values, i, this);
437444
}
445+
446+
@Override
447+
public long[] fixedRoundingPoints() {
448+
return null;
449+
}
438450
}
439451

440452
static class TimeUnitRounding extends Rounding {
@@ -1253,6 +1265,12 @@ public long nextRoundingValue(long utcMillis) {
12531265
public double roundingSize(long utcMillis, DateTimeUnit timeUnit) {
12541266
return delegatePrepared.roundingSize(utcMillis, timeUnit);
12551267
}
1268+
1269+
@Override
1270+
public long[] fixedRoundingPoints() {
1271+
// TODO we can likely translate here
1272+
return null;
1273+
}
12561274
};
12571275
}
12581276

@@ -1335,5 +1353,10 @@ public long nextRoundingValue(long utcMillis) {
13351353
public double roundingSize(long utcMillis, DateTimeUnit timeUnit) {
13361354
return delegate.roundingSize(utcMillis, timeUnit);
13371355
}
1356+
1357+
@Override
1358+
public long[] fixedRoundingPoints() {
1359+
return Arrays.copyOf(values, max);
1360+
}
13381361
}
13391362
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
/*
2+
* Licensed to Elasticsearch under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.elasticsearch.search.aggregations;
21+
22+
import org.apache.lucene.index.LeafReaderContext;
23+
import org.apache.lucene.search.ScoreMode;
24+
import org.elasticsearch.common.CheckedFunction;
25+
import org.elasticsearch.search.profile.aggregation.InternalAggregationProfileTree;
26+
27+
import java.io.IOException;
28+
import java.util.HashMap;
29+
import java.util.Map;
30+
import java.util.function.BiConsumer;
31+
32+
/**
33+
* An {@linkplain Aggregator} that delegates collection to another
34+
* {@linkplain Aggregator} and then translates its results into the results
35+
* you'd expect from another aggregation.
36+
*/
37+
public abstract class AdaptingAggregator extends Aggregator {
38+
private final Aggregator parent;
39+
private final Aggregator delegate;
40+
41+
public AdaptingAggregator(
42+
Aggregator parent,
43+
AggregatorFactories subAggregators,
44+
CheckedFunction<AggregatorFactories, Aggregator, IOException> delegate
45+
) throws IOException {
46+
// Its important we set parent first or else when we build the sub-aggregators they can fail because they'll call this.parent.
47+
this.parent = parent;
48+
/*
49+
* Lock the parent of the sub-aggregators to *this* instead of to
50+
* the delegate. This keeps the parent link shaped like the requested
51+
* agg tree. Thisis how it has always been and some aggs rely on it.
52+
*/
53+
this.delegate = delegate.apply(subAggregators.fixParent(this));
54+
assert this.delegate.parent() == parent : "invalid parent set on delegate";
55+
}
56+
57+
/**
58+
* Adapt the result from the collecting {@linkplain Aggregator} into the
59+
* result expected by this {@linkplain Aggregator}.
60+
*/
61+
protected abstract InternalAggregation adapt(InternalAggregation delegateResult);
62+
63+
@Override
64+
public final void close() {
65+
delegate.close();
66+
}
67+
68+
@Override
69+
public final ScoreMode scoreMode() {
70+
return delegate.scoreMode();
71+
}
72+
73+
@Override
74+
public final String name() {
75+
return delegate.name();
76+
}
77+
78+
@Override
79+
public final Aggregator parent() {
80+
return parent;
81+
}
82+
83+
@Override
84+
public final Aggregator subAggregator(String name) {
85+
return delegate.subAggregator(name);
86+
}
87+
88+
@Override
89+
public final LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException {
90+
return delegate.getLeafCollector(ctx);
91+
}
92+
93+
@Override
94+
public final void preCollection() throws IOException {
95+
delegate.preCollection();
96+
}
97+
98+
@Override
99+
public final InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws IOException {
100+
InternalAggregation[] delegateResults = delegate.buildAggregations(owningBucketOrds);
101+
InternalAggregation[] result = new InternalAggregation[owningBucketOrds.length];
102+
for (int ordIdx = 0; ordIdx < owningBucketOrds.length; ordIdx++) {
103+
result[ordIdx] = adapt(delegateResults[ordIdx]);
104+
}
105+
return result;
106+
}
107+
108+
@Override
109+
public final InternalAggregation buildEmptyAggregation() {
110+
return adapt(delegate.buildEmptyAggregation());
111+
}
112+
113+
@Override
114+
public final Aggregator[] subAggregators() {
115+
return delegate.subAggregators();
116+
}
117+
118+
@Override
119+
public void collectDebugInfo(BiConsumer<String, Object> add) {
120+
super.collectDebugInfo(add);
121+
add.accept("delegate", InternalAggregationProfileTree.typeFromAggregator(delegate));
122+
Map<String, Object> delegateDebug = new HashMap<>();
123+
delegate.collectDebugInfo(delegateDebug::put);
124+
add.accept("delegate_debug", delegateDebug);
125+
}
126+
127+
public Aggregator delegate() {
128+
return delegate;
129+
}
130+
}

server/src/main/java/org/elasticsearch/search/aggregations/Aggregator.java

+5
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,11 @@ public final InternalAggregation buildTopLevel() throws IOException {
172172
*/
173173
public void collectDebugInfo(BiConsumer<String, Object> add) {}
174174

175+
/**
176+
* Get the aggregators running under this one.
177+
*/
178+
public abstract Aggregator[] subAggregators();
179+
175180
/** Aggregation mode for sub aggregations. */
176181
public enum SubAggCollectionMode implements Writeable {
177182

server/src/main/java/org/elasticsearch/search/aggregations/AggregatorBase.java

+1
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,7 @@ public Aggregator parent() {
224224
return parent;
225225
}
226226

227+
@Override
227228
public Aggregator[] subAggregators() {
228229
return subAggregators;
229230
}

server/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java

+21
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,27 @@ public int countAggregators() {
227227
return factories.length;
228228
}
229229

230+
/**
231+
* This returns a copy of {@link AggregatorFactories} modified so that
232+
* calls to {@link #createSubAggregators} will ignore the provided parent
233+
* aggregator and always use {@code fixedParent} provided in to this
234+
* method.
235+
* <p>
236+
* {@link AdaptingAggregator} uses this to make sure that sub-aggregators
237+
* get the {@link AdaptingAggregator} aggregator itself as the parent.
238+
*/
239+
public AggregatorFactories fixParent(Aggregator fixedParent) {
240+
AggregatorFactories previous = this;
241+
return new AggregatorFactories(factories) {
242+
@Override
243+
public Aggregator[] createSubAggregators(SearchContext searchContext, Aggregator parent, CardinalityUpperBound cardinality)
244+
throws IOException {
245+
// Note that we're throwing out the "parent" passed in to this method and using the parent passed to fixParent
246+
return previous.createSubAggregators(searchContext, fixedParent, cardinality);
247+
}
248+
};
249+
}
250+
230251
/**
231252
* A mutable collection of {@link AggregationBuilder}s and
232253
* {@link PipelineAggregationBuilder}s.

server/src/main/java/org/elasticsearch/search/aggregations/bucket/DeferringBucketCollector.java

+5
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,11 @@ public Aggregator resolveSortPath(PathElement next, Iterator<PathElement> path)
119119
public BucketComparator bucketComparator(String key, SortOrder order) {
120120
throw new UnsupportedOperationException("Can't sort on deferred aggregations");
121121
}
122+
123+
@Override
124+
public Aggregator[] subAggregators() {
125+
return in.subAggregators();
126+
}
122127
}
123128

124129
}

0 commit comments

Comments
 (0)