Skip to content

Support geotile_grid aggregation in composite agg sources #45810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
96887d7
Support geohash_grid aggregation in composite agg sources
benwtrent Aug 20, 2019
bb572f7
adding license headers
benwtrent Aug 21, 2019
a8af9fa
Merge branch 'master' into feature/add-geohash_grid-to-composite-aggs
elasticmachine Aug 21, 2019
5a3a0a9
Merge branch 'master' into feature/add-geohash_grid-to-composite-aggs
elasticmachine Aug 23, 2019
6cadfa4
making cellIDsource a public class per PR comments
benwtrent Aug 28, 2019
b22d8ca
Merge branch 'feature/add-geohash_grid-to-composite-aggs' of github.c…
benwtrent Aug 28, 2019
82c3b56
Merge branch 'master' into feature/add-geohash_grid-to-composite-aggs
elasticmachine Aug 28, 2019
9b21532
Throwing error on bwc serialization failure, removing unused shardSize
benwtrent Aug 30, 2019
3ba8579
Merge branch 'feature/add-geohash_grid-to-composite-aggs' of github.c…
benwtrent Aug 30, 2019
30dacd4
Merge branch 'master' into feature/add-geohash_grid-to-composite-aggs
elasticmachine Aug 30, 2019
ecdcce6
removing bad method call
benwtrent Aug 30, 2019
2e4461d
correcting error message
benwtrent Aug 30, 2019
022e66d
making .format(String) throw IAE
benwtrent Aug 30, 2019
ff518c9
move to use geotile instead of hash
benwtrent Sep 4, 2019
b3e6f9d
Merge branch 'master' into feature/add-geohash_grid-to-composite-aggs
benwtrent Sep 4, 2019
54c9e12
minor fixups
benwtrent Sep 4, 2019
c30cc32
fixing test
benwtrent Sep 4, 2019
2a09dc1
moving DocValueFormat for geotile
benwtrent Sep 4, 2019
5ecd671
moving geotile so that it can be serialized
benwtrent Sep 4, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ setup:
type: keyword
long:
type: long
geo_point:
type: geo_point
nested:
type: nested
properties:
Expand All @@ -38,25 +40,25 @@ setup:
index:
index: test
id: 1
body: { "keyword": "foo", "long": [10, 20], "nested": [{"nested_long": 10}, {"nested_long": 20}] }
body: { "keyword": "foo", "long": [10, 20], "geo_point": "37.2343,-115.8067", "nested": [{"nested_long": 10}, {"nested_long": 20}] }

- do:
index:
index: test
id: 2
body: { "keyword": ["foo", "bar"] }
body: { "keyword": ["foo", "bar"], "geo_point": "41.12,-71.34" }

- do:
index:
index: test
id: 3
body: { "keyword": "bar", "long": [100, 0], "nested": [{"nested_long": 10}, {"nested_long": 0}] }
body: { "keyword": "bar", "long": [100, 0], "geo_point": "90.0,0.0", "nested": [{"nested_long": 10}, {"nested_long": 0}] }

- do:
index:
index: test
id: 4
body: { "keyword": "bar", "long": [1000, 0], "nested": [{"nested_long": 1000}, {"nested_long": 20}] }
body: { "keyword": "bar", "long": [1000, 0], "geo_point": "41.12,-71.34", "nested": [{"nested_long": 1000}, {"nested_long": 20}] }

- do:
index:
Expand Down Expand Up @@ -615,3 +617,87 @@ setup:
}
]

---
"Simple Composite aggregation with Geohash grid":
- skip:
version: " - 7.99.99"
reason: geohash_grid is not supported until 8.0.0
- do:
search:
rest_total_hits_as_int: true
index: test
body:
aggregations:
test:
composite:
sources: [
"geo": {
"geohash_grid": {
"field": "geo_point",
"precision": 12
}
},
{
"kw": {
"terms": {
"field": "keyword"
}
}
}
]

- match: {hits.total: 6}
- length: { aggregations.test.buckets: 4 }
- match: { aggregations.test.buckets.0.key.geo: "upbpbpbpbpbp" }
- match: { aggregations.test.buckets.0.key.kw: "bar" }
- match: { aggregations.test.buckets.0.doc_count: 1 }
- match: { aggregations.test.buckets.1.key.geo: "9qteuf21s29g" }
- match: { aggregations.test.buckets.1.key.kw: "foo" }
- match: { aggregations.test.buckets.1.doc_count: 1 }
- match: { aggregations.test.buckets.2.key.geo: "drm3btev3e86" }
- match: { aggregations.test.buckets.2.key.kw: "bar" }
- match: { aggregations.test.buckets.2.doc_count: 2 }
- match: { aggregations.test.buckets.3.key.geo: "drm3btev3e86" }
- match: { aggregations.test.buckets.3.key.kw: "foo" }
- match: { aggregations.test.buckets.3.doc_count: 1 }
---
"Simple Composite aggregation with geohash grid add aggregate after":
- skip:
version: " - 7.99.99"
reason: geohash_grid is not supported until 8.0.0
- do:
search:
rest_total_hits_as_int: true
index: test
body:
aggregations:
test:
composite:
sources: [
"geo": {
"geohash_grid": {
"field": "geo_point",
"precision": 12
}
},
{
"kw": {
"terms": {
"field": "keyword"
}
}
}
]
after: { "geo": "upbpbpbpbpbp", "kw": "bar" }

- match: {hits.total: 6}
- length: { aggregations.test.buckets: 3 }
- match: { aggregations.test.buckets.0.key.geo: "9qteuf21s29g" }
- match: { aggregations.test.buckets.0.key.kw: "foo" }
- match: { aggregations.test.buckets.0.doc_count: 1 }
- match: { aggregations.test.buckets.1.key.geo: "drm3btev3e86" }
- match: { aggregations.test.buckets.1.key.kw: "bar" }
- match: { aggregations.test.buckets.1.doc_count: 2 }
- match: { aggregations.test.buckets.2.key.geo: "drm3btev3e86" }
- match: { aggregations.test.buckets.2.key.kw: "foo" }
- match: { aggregations.test.buckets.2.doc_count: 1 }
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import org.elasticsearch.search.aggregations.MultiBucketCollector;
import org.elasticsearch.search.aggregations.MultiBucketConsumerService;
import org.elasticsearch.search.aggregations.bucket.BucketsAggregator;
import org.elasticsearch.search.aggregations.bucket.geogrid.CellIdSource;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;
Expand Down Expand Up @@ -299,6 +300,20 @@ private SingleDimensionValuesSource<?> createValuesSource(BigArrays bigArrays, I
reverseMul
);

} else if (config.valuesSource() instanceof CellIdSource) {
final CellIdSource cis = (CellIdSource) config.valuesSource();
return new GeohashValuesSource(
bigArrays,
config.fieldType(),
cis::longValues,
LongUnaryOperator.identity(),
config.format(),
config.missingBucket(),
size,
reverseMul,
cis.precision(),
cis.encoder()
);
} else if (config.valuesSource() instanceof ValuesSource.Numeric) {
final ValuesSource.Numeric vs = (ValuesSource.Numeric) config.valuesSource();
if (vs.isFloatingPoint()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.elasticsearch.search.aggregations.bucket.composite;

import org.elasticsearch.Version;
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.ParsingException;
import org.elasticsearch.common.io.stream.StreamInput;
Expand Down Expand Up @@ -67,6 +68,12 @@ public static void writeTo(CompositeValuesSourceBuilder<?> builder, StreamOutput
code = 1;
} else if (builder.getClass() == HistogramValuesSourceBuilder.class) {
code = 2;
} else if (builder.getClass() == GeoHashGridValuesSourceBuilder.class) {
if (out.getVersion().before(Version.V_8_0_0)) {
throw new IOException("Attempting to serialize [" + builder.getClass().getSimpleName()
+ "] to a node with unsupported version [" + out.getVersion() + "]");
}
code = 3;
} else {
throw new IOException("invalid builder type: " + builder.getClass().getSimpleName());
}
Expand All @@ -83,6 +90,8 @@ public static CompositeValuesSourceBuilder<?> readFrom(StreamInput in) throws IO
return new DateHistogramValuesSourceBuilder(in);
case 2:
return new HistogramValuesSourceBuilder(in);
case 3:
return new GeoHashGridValuesSourceBuilder(in);
default:
throw new IOException("Invalid code " + code);
}
Expand Down Expand Up @@ -112,6 +121,9 @@ public static CompositeValuesSourceBuilder<?> fromXContent(XContentParser parser
case HistogramValuesSourceBuilder.TYPE:
builder = HistogramValuesSourceBuilder.parse(name, parser);
break;
case GeoHashGridValuesSourceBuilder.TYPE:
builder = GeoHashGridValuesSourceBuilder.parse(name, parser);
break;
default:
throw new ParsingException(parser.getTokenLocation(), "invalid source type: " + type);
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.search.aggregations.bucket.composite;

import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.geo.GeoUtils;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.xcontent.ObjectParser;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.geometry.utils.Geohash;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.bucket.geogrid.CellIdSource;
import org.elasticsearch.search.aggregations.bucket.geogrid.GeoHashGridAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValueType;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSourceConfig;
import org.elasticsearch.search.internal.SearchContext;

import java.io.IOException;
import java.util.Objects;

public class GeoHashGridValuesSourceBuilder extends CompositeValuesSourceBuilder<GeoHashGridValuesSourceBuilder> {
static final String TYPE = "geohash_grid";

private static final ObjectParser<GeoHashGridValuesSourceBuilder, Void> PARSER;
static {
PARSER = new ObjectParser<>(GeoHashGridValuesSourceBuilder.TYPE);
PARSER.declareInt(GeoHashGridValuesSourceBuilder::precision, new ParseField("precision"));
CompositeValuesSourceParserHelper.declareValuesSourceFields(PARSER, ValueType.NUMERIC);
}

static GeoHashGridValuesSourceBuilder parse(String name, XContentParser parser) throws IOException {
return PARSER.parse(parser, new GeoHashGridValuesSourceBuilder(name), null);
}

private int precision = GeoHashGridAggregationBuilder.DEFAULT_PRECISION;

GeoHashGridValuesSourceBuilder(String name) {
super(name);
}

GeoHashGridValuesSourceBuilder(StreamInput in) throws IOException {
super(in);
this.precision = in.readInt();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an IAE if format(DocValueFormat) is used since it is ignored in the build ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't declare a format field in the PARSER we throw an unknown field error and the terms and histogram sources don't do this check.

I can add it if that is the prevailing opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for rest queries but users of the HLRC can use this builder directly so it would be nice to fail early if they try to set the format ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not know the HLRC had direct access to this. Will definitely update to prevent setting it.

public GeoHashGridValuesSourceBuilder precision(int precision) {
this.precision = GeoUtils.checkPrecisionRange(precision);
return this;
}

@Override
protected void innerWriteTo(StreamOutput out) throws IOException {
out.writeInt(precision);
}

@Override
protected void doXContentBody(XContentBuilder builder, Params params) throws IOException {
builder.field("precision", precision);
}

@Override
String type() {
return TYPE;
}

@Override
public int hashCode() {
return Objects.hash(super.hashCode(), precision);
}

@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
if (super.equals(obj) == false) return false;
GeoHashGridValuesSourceBuilder other = (GeoHashGridValuesSourceBuilder) obj;
return precision == other.precision;
}

@Override
protected CompositeValuesSourceConfig innerBuild(SearchContext context, ValuesSourceConfig<?> config) throws IOException {
ValuesSource orig = config.toValuesSource(context.getQueryShardContext());
if (orig == null) {
orig = ValuesSource.GeoPoint.EMPTY;
}
if (orig instanceof ValuesSource.GeoPoint) {
ValuesSource.GeoPoint geoPoint = (ValuesSource.GeoPoint) orig;
// is specified in the builder.
final MappedFieldType fieldType = config.fieldContext() != null ? config.fieldContext().fieldType() : null;
CellIdSource cellIdSource = new CellIdSource(geoPoint, precision, Geohash::longEncode);
return new CompositeValuesSourceConfig(name, fieldType, cellIdSource, DocValueFormat.GEOHASH, order(), missingBucket());
} else {
throw new IllegalArgumentException("invalid source, expected numeric, got " + orig.getClass().getSimpleName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expected geo_point ?

}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.search.aggregations.bucket.composite;

import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.SortedNumericDocValues;
import org.elasticsearch.common.CheckedFunction;
import org.elasticsearch.common.util.BigArrays;
import org.elasticsearch.geometry.Point;
import org.elasticsearch.geometry.utils.Geohash;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.bucket.geogrid.CellIdSource;

import java.io.IOException;
import java.util.function.LongUnaryOperator;

/**
* A {@link SingleDimensionValuesSource} for geohash values.
*
* Since geohash values can be represented as long values, this class is almost the same as {@link LongValuesSource}
* The main differences is {@link GeohashValuesSource#setAfter(Comparable)} as it needs to accept geohash string values.
*/
class GeohashValuesSource extends LongValuesSource {
private final int precision;
private final CellIdSource.GeoPointLongEncoder encoder;
GeohashValuesSource(BigArrays bigArrays,
MappedFieldType fieldType,
CheckedFunction<LeafReaderContext, SortedNumericDocValues, IOException> docValuesFunc,
LongUnaryOperator rounding,
DocValueFormat format,
boolean missingBucket,
int size,
int reverseMul,
int precision,
CellIdSource.GeoPointLongEncoder encoder) {
super(bigArrays, fieldType, docValuesFunc, rounding, format, missingBucket, size, reverseMul);
this.precision = precision;
this.encoder = encoder;
}

@Override
void setAfter(Comparable value) {
if (missingBucket && value == null) {
afterValue = null;
} else if (value instanceof Number) {
afterValue = ((Number) value).longValue();
} else {
// if it is a string it should be a geohash formatted value.
// We need to preserve the precision between the decoding the geohash and encoding it into a long
Point point = Geohash.toPoint(value.toString());
afterValue = encoder.encode(point.getLon(), point.getLat(), precision);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the best way I could figure out how to transform a geohash value into a long value of the appropriate precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should be implemented in the geohash DocValueFormat#parseLong(String value..) but it requires to know the precision. +1 to leave it like this for now, the parsing is only needed for the composite so we can revisit if it is required elsewhere.

}
}
}
Loading