Skip to content

Speed up date_histogram by precomputing ranges #61467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 24, 2020
Merged
73 changes: 71 additions & 2 deletions server/src/main/java/org/elasticsearch/common/Rounding.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
*/
package org.elasticsearch.common;

import org.apache.lucene.util.ArrayUtil;
import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.Version;
import org.elasticsearch.common.LocalTimeOffset.Gap;
Expand All @@ -44,8 +45,10 @@
import java.time.temporal.TemporalQueries;
import java.time.zone.ZoneOffsetTransition;
import java.time.zone.ZoneRules;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.Set;
import java.util.concurrent.TimeUnit;

/**
Expand Down Expand Up @@ -401,8 +404,22 @@ private LocalDateTime truncateLocalDateTime(LocalDateTime localDateTime) {
}
}

/**
* Time zones with two midnights get "funny" non-continuous rounding
* that isn't compatible with the pre-computed array rounding.
*/
private static final Set<String> HAS_TWO_MIDNIGHTS = Set.of("America/Moncton", "America/St_Johns", "Canada/Newfoundland");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be doable to detect timezones that don't work with this optimization at runtime instead of maintaining an allowlist?

Copy link
Member Author

@nik9000 nik9000 Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a way to do right now. At least, not a good way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Canada was changing time at 2 am, on another side, some other known time zones such as Asia/Gaza for example seems to indeed have 2 mightnights. I did some quick and dirty test and I have got a slightly different list for timezones in my JVM:

America/Asuncion 2020-03-22T00:00 -> 2020-03-21T23:00
America/Havana 2020-11-01T01:00 -> 2020-11-01T00:00
America/Santiago 2020-04-05T00:00 -> 2020-04-04T23:00
America/Scoresbysund 2020-10-25T01:00 -> 2020-10-25T00:00
Asia/Amman 2020-10-30T01:00 -> 2020-10-30T00:00
Asia/Beirut 2020-10-25T00:00 -> 2020-10-24T23:00
Asia/Damascus 2020-10-30T00:00 -> 2020-10-29T23:00
Asia/Gaza 2020-10-31T01:00 -> 2020-10-31T00:00
Asia/Hebron 2020-10-31T01:00 -> 2020-10-31T00:00
Asia/Tehran 2020-09-21T00:00 -> 2020-09-20T23:00
Atlantic/Azores 2020-10-25T01:00 -> 2020-10-25T00:00
Chile/Continental 2020-04-05T00:00 -> 2020-04-04T23:00
Cuba 2020-11-01T01:00 -> 2020-11-01T00:00
Iran 2020-09-21T00:00 -> 2020-09-20T23:00

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really only a problem when there are two midnights, or is it a more general problem when the time goes back due to daylight savings, and you just need special values of offset to make the bug occur if the transition is not around midnight?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've played with offset to double check and can't cause issues with it. However do you think we could detect timezones that don't work dynamically instead of relying on a static list? E.g. could we iterate transitions in the considered interval and check whether there's one that brings us to a different day?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use something like the test case uses to find candidates, but it'd require loading all of the time zone rules on startup. I'm hoping that the test can prevent us from having to do that by being very careful about this list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wasn't thinking about testing all timezones up-front on startup, I was more thinking of doing the test when building the Rounding object by only looking at transitions of the considered timezone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm - the assert that I have below sort of asserts that. But it isn't nearly as strong. I'll think about it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz I've pushed a change that does this.


@Override
public Prepared prepare(long minUtcMillis, long maxUtcMillis) {
Prepared orig = prepareOffsetRounding(minUtcMillis, maxUtcMillis);
if (unitRoundsToMidnight && HAS_TWO_MIDNIGHTS.contains(timeZone.getId())) {
return orig;
}
return maybeUseArray(orig, minUtcMillis, maxUtcMillis, 128);
}

private Prepared prepareOffsetRounding(long minUtcMillis, long maxUtcMillis) {
long minLookup = minUtcMillis - unit.extraLocalOffsetLookup();
long maxLookup = maxUtcMillis;

Expand All @@ -421,7 +438,6 @@ public Prepared prepare(long minUtcMillis, long maxUtcMillis) {
// Range too long, just use java.time
return prepareJavaTime();
}

LocalTimeOffset fixedOffset = lookup.fixedInRange(minLookup, maxLookup);
if (fixedOffset != null) {
// The time zone is effectively fixed
Expand Down Expand Up @@ -1015,7 +1031,7 @@ public byte id() {

@Override
public Prepared prepare(long minUtcMillis, long maxUtcMillis) {
return wrapPreparedRounding(delegate.prepare(minUtcMillis, maxUtcMillis));
return wrapPreparedRounding(delegate.prepare(minUtcMillis - offset, maxUtcMillis - offset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change fixing an existing bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think the bug is worse with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new assertions in the ArrayRounding fail without this.

}

@Override
Expand Down Expand Up @@ -1085,4 +1101,57 @@ public static Rounding read(StreamInput in) throws IOException {
throw new ElasticsearchException("unknown rounding id [" + id + "]");
}
}

/**
* Attempt to build a {@link Prepared} implementation that relies on pre-calcuated
* "round down" points. If there would be more than {@code max} points then return
* the original implementation, otherwise return the new, faster implementation.
*/
static Prepared maybeUseArray(Prepared orig, long minUtcMillis, long maxUtcMillis, int max) {
long[] values = new long[1];
long rounded = orig.round(minUtcMillis);
int i = 0;
values[i++] = rounded;
while ((rounded = orig.nextRoundingValue(rounded)) <= maxUtcMillis) {
if (i >= max) {
return orig;
}
assert values[i - 1] == orig.round(rounded - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super clear to me what this assert is guarding against. Can you add a comment please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

values = ArrayUtil.grow(values, i + 1);
values[i++]= rounded;
}
return new ArrayRounding(values, i, orig);
}

/**
* Implementation of {@link Prepared} using pre-calculated "round down" points.
*/
private static class ArrayRounding implements Prepared {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to implement new methods added in #61369.

private final long[] values;
private final int max;
private final Prepared delegate;

private ArrayRounding(long[] values, int max, Prepared delegate) {
this.values = values;
this.max = max;
this.delegate = delegate;
}

@Override
public long round(long utcMillis) {
assert values[0] <= utcMillis : "utcMillis must be after " + values[0];
int idx = Arrays.binarySearch(values, 0, max, utcMillis);
assert idx != -1 : "The insertion point is before the array! This should have tripped the assertion above.";
assert -1 - idx <= values.length : "This insertion point is after the end of the array.";
if (idx < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe assert that idx is neither -1 nor -1 - max?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

idx = -2 - idx;
}
return values[idx];
}

@Override
public long nextRoundingValue(long utcMillis) {
return delegate.nextRoundingValue(utcMillis);
}
}
}
12 changes: 10 additions & 2 deletions server/src/test/java/org/elasticsearch/common/RoundingTests.java
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,9 @@ public void testOffsetRounding() {
assertThat(rounding.nextRoundingValue(0), equalTo(oneDay - twoHours));
assertThat(rounding.withoutOffset().round(0), equalTo(0L));
assertThat(rounding.withoutOffset().nextRoundingValue(0), equalTo(oneDay));

rounding = Rounding.builder(Rounding.DateTimeUnit.DAY_OF_MONTH).timeZone(ZoneId.of("America/New_York")).offset(-twoHours).build();
assertThat(rounding.round(time("2020-11-01T09:00:00")), equalTo(time("2020-11-01T02:00:00")));
}

/**
Expand All @@ -231,7 +234,7 @@ public void testRandomTimeUnitRounding() {
Rounding.DateTimeUnit unit = randomFrom(Rounding.DateTimeUnit.values());
ZoneId tz = randomZone();
Rounding rounding = new Rounding.TimeUnitRounding(unit, tz);
long[] bounds = randomDateBounds();
long[] bounds = randomDateBounds(unit);
Rounding.Prepared prepared = rounding.prepare(bounds[0], bounds[1]);

// Check that rounding is internally consistent and consistent with nextRoundingValue
Expand Down Expand Up @@ -894,8 +897,13 @@ private static long randomDate() {
return Math.abs(randomLong() % (2 * (long) 10e11)); // 1970-01-01T00:00:00Z - 2033-05-18T05:33:20.000+02:00
}

private static long[] randomDateBounds() {
private static long[] randomDateBounds(Rounding.DateTimeUnit unit) {
long b1 = randomDate();
if (randomBoolean()) {
// Sometimes use a fairly close date
return new long[] {b1, b1 + unit.extraLocalOffsetLookup() * between(1, 40)};
}
// Otherwise use a totally random date
long b2 = randomValueOtherThan(b1, RoundingTests::randomDate);
if (b1 < b2) {
return new long[] {b1, b2};
Expand Down