Skip to content

Commit 5bc7822

Browse files
authored
[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach (#35320)
This commit exposes lucene's LatLonShape field as the default type in GeoShapeFieldMapper. To use the new indexing approach, simply set "type" : "geo_shape" in the mappings without setting any of the strategy, precision, tree_levels, or distance_error_pct parameters. Note the following when using the new indexing approach: * geo_shape query does not support querying by MULTIPOINT. * LINESTRING and MULTILINESTRING queries do not yet support WITHIN relation. * CONTAINS relation is not yet supported. The tree, precision, tree_levels, distance_error_pct, and points_only parameters are deprecated.
1 parent f1e1f93 commit 5bc7822

31 files changed

+2635
-1233
lines changed

docs/reference/mapping/types/geo-shape.asciidoc

+110-76
Original file line numberDiff line numberDiff line change
@@ -21,48 +21,59 @@ type.
2121
|=======================================================================
2222
|Option |Description| Default
2323

24-
|`tree` |Name of the PrefixTree implementation to be used: `geohash` for
25-
GeohashPrefixTree and `quadtree` for QuadPrefixTree.
26-
| `geohash`
27-
28-
|`precision` |This parameter may be used instead of `tree_levels` to set
29-
an appropriate value for the `tree_levels` parameter. The value
30-
specifies the desired precision and Elasticsearch will calculate the
31-
best tree_levels value to honor this precision. The value should be a
32-
number followed by an optional distance unit. Valid distance units
33-
include: `in`, `inch`, `yd`, `yard`, `mi`, `miles`, `km`, `kilometers`,
34-
`m`,`meters`, `cm`,`centimeters`, `mm`, `millimeters`.
24+
|`tree |deprecated[6.6, PrefixTrees no longer used] Name of the PrefixTree
25+
implementation to be used: `geohash` for GeohashPrefixTree and `quadtree`
26+
for QuadPrefixTree. Note: This parameter is only relevant for `term` and
27+
`recursive` strategies.
28+
| `quadtree`
29+
30+
|`precision` |deprecated[6.6, PrefixTrees no longer used] This parameter may
31+
be used instead of `tree_levels` to set an appropriate value for the
32+
`tree_levels` parameter. The value specifies the desired precision and
33+
Elasticsearch will calculate the best tree_levels value to honor this
34+
precision. The value should be a number followed by an optional distance
35+
unit. Valid distance units include: `in`, `inch`, `yd`, `yard`, `mi`,
36+
`miles`, `km`, `kilometers`, `m`,`meters`, `cm`,`centimeters`, `mm`,
37+
`millimeters`. Note: This parameter is only relevant for `term` and
38+
`recursive` strategies.
3539
| `50m`
3640

37-
|`tree_levels` |Maximum number of layers to be used by the PrefixTree.
38-
This can be used to control the precision of shape representations and
39-
therefore how many terms are indexed. Defaults to the default value of
40-
the chosen PrefixTree implementation. Since this parameter requires a
41-
certain level of understanding of the underlying implementation, users
42-
may use the `precision` parameter instead. However, Elasticsearch only
43-
uses the tree_levels parameter internally and this is what is returned
44-
via the mapping API even if you use the precision parameter.
41+
|`tree_levels` |deprecated[6.6, PrefixTrees no longer used] Maximum number
42+
of layers to be used by the PrefixTree. This can be used to control the
43+
precision of shape representations andtherefore how many terms are
44+
indexed. Defaults to the default value of the chosen PrefixTree
45+
implementation. Since this parameter requires a certain level of
46+
understanding of the underlying implementation, users may use the
47+
`precision` parameter instead. However, Elasticsearch only uses the
48+
tree_levels parameter internally and this is what is returned via the
49+
mapping API even if you use the precision parameter. Note: This parameter
50+
is only relevant for `term` and `recursive` strategies.
4551
| various
4652

47-
|`strategy` |The strategy parameter defines the approach for how to
48-
represent shapes at indexing and search time. It also influences the
49-
capabilities available so it is recommended to let Elasticsearch set
50-
this parameter automatically. There are two strategies available:
51-
`recursive` and `term`. Term strategy supports point types only (the
52-
`points_only` parameter will be automatically set to true) while
53-
Recursive strategy supports all shape types. (IMPORTANT: see
54-
<<prefix-trees, Prefix trees>> for more detailed information)
53+
|`strategy` |deprecated[6.6, PrefixTrees no longer used] The strategy
54+
parameter defines the approach for how to represent shapes at indexing
55+
and search time. It also influences the capabilities available so it
56+
is recommended to let Elasticsearch set this parameter automatically.
57+
There are two strategies available: `recursive`, and `term`.
58+
Recursive and Term strategies are deprecated and will be removed in a
59+
future version. While they are still available, the Term strategy
60+
supports point types only (the `points_only` parameter will be
61+
automatically set to true) while Recursive strategy supports all
62+
shape types. (IMPORTANT: see <<prefix-trees, Prefix trees>> for more
63+
detailed information about these strategies)
5564
| `recursive`
5665

57-
|`distance_error_pct` |Used as a hint to the PrefixTree about how
58-
precise it should be. Defaults to 0.025 (2.5%) with 0.5 as the maximum
59-
supported value. PERFORMANCE NOTE: This value will default to 0 if a `precision` or
60-
`tree_level` definition is explicitly defined. This guarantees spatial precision
61-
at the level defined in the mapping. This can lead to significant memory usage
62-
for high resolution shapes with low error (e.g., large shapes at 1m with < 0.001 error).
63-
To improve indexing performance (at the cost of query accuracy) explicitly define
64-
`tree_level` or `precision` along with a reasonable `distance_error_pct`, noting
65-
that large shapes will have greater false positives.
66+
|`distance_error_pct` |deprecated[6.6, PrefixTrees no longer used] Used as a
67+
hint to the PrefixTree about how precise it should be. Defaults to 0.025 (2.5%)
68+
with 0.5 as the maximum supported value. PERFORMANCE NOTE: This value will
69+
default to 0 if a `precision` or `tree_level` definition is explicitly defined.
70+
This guarantees spatial precision at the level defined in the mapping. This can
71+
lead to significant memory usage for high resolution shapes with low error
72+
(e.g., large shapes at 1m with < 0.001 error). To improve indexing performance
73+
(at the cost of query accuracy) explicitly define `tree_level` or `precision`
74+
along with a reasonable `distance_error_pct`, noting that large shapes will have
75+
greater false positives. Note: This parameter is only relevant for `term` and
76+
`recursive` strategies.
6677
| `0.025`
6778

6879
|`orientation` |Optionally define how to interpret vertex order for
@@ -77,13 +88,13 @@ sets vertex order for the coordinate list of a geo_shape field but can be
7788
overridden in each individual GeoJSON or WKT document.
7889
| `ccw`
7990

80-
|`points_only` |Setting this option to `true` (defaults to `false`) configures
81-
the `geo_shape` field type for point shapes only (NOTE: Multi-Points are not
82-
yet supported). This optimizes index and search performance for the `geohash` and
83-
`quadtree` when it is known that only points will be indexed. At present geo_shape
84-
queries can not be executed on `geo_point` field types. This option bridges the gap
85-
by improving point performance on a `geo_shape` field so that `geo_shape` queries are
86-
optimal on a point only field.
91+
|`points_only` |deprecated[6.6, PrefixTrees no longer used] Setting this option to
92+
`true` (defaults to `false`) configures the `geo_shape` field type for point
93+
shapes only (NOTE: Multi-Points are not yet supported). This optimizes index and
94+
search performance for the `geohash` and `quadtree` when it is known that only points
95+
will be indexed. At present geo_shape queries can not be executed on `geo_point`
96+
field types. This option bridges the gap by improving point performance on a
97+
`geo_shape` field so that `geo_shape` queries are optimal on a point only field.
8798
| `false`
8899

89100
|`ignore_malformed` |If true, malformed GeoJSON or WKT shapes are ignored. If
@@ -100,16 +111,35 @@ and reject the whole document.
100111

101112
|=======================================================================
102113

114+
115+
[[geoshape-indexing-approach]]
116+
[float]
117+
==== Indexing approach
118+
GeoShape types are indexed by decomposing the shape into a triangular mesh and
119+
indexing each triangle as a 7 dimension point in a BKD tree. This provides
120+
near perfect spatial resolution (down to 1e-7 decimal degree precision) since all
121+
spatial relations are computed using an encoded vector representation of the
122+
original shape instead of a raster-grid representation as used by the
123+
<<prefix-trees>> indexing approach. Performance of the tessellator primarily
124+
depends on the number of vertices that define the polygon/multi-polyogn. While
125+
this is the default indexing technique prefix trees can still be used by setting
126+
the `tree` or `strategy` parameters according to the appropriate
127+
<<geo-shape-mapping-options>>. Note that these parameters are now deprecated
128+
and will be removed in a future version.
129+
103130
[[prefix-trees]]
104131
[float]
105132
==== Prefix trees
106133

107-
To efficiently represent shapes in the index, Shapes are converted into
108-
a series of hashes representing grid squares (commonly referred to as "rasters")
109-
using implementations of a PrefixTree. The tree notion comes from the fact that
110-
the PrefixTree uses multiple grid layers, each with an increasing level of
111-
precision to represent the Earth. This can be thought of as increasing the level
112-
of detail of a map or image at higher zoom levels.
134+
deprecated[6.6, PrefixTrees no longer used] To efficiently represent shapes in
135+
an inverted index, Shapes are converted into a series of hashes representing
136+
grid squares (commonly referred to as "rasters") using implementations of a
137+
PrefixTree. The tree notion comes from the fact that the PrefixTree uses multiple
138+
grid layers, each with an increasing level of precision to represent the Earth.
139+
This can be thought of as increasing the level of detail of a map or image at higher
140+
zoom levels. Since this approach causes precision issues with indexed shape, it has
141+
been deprecated in favor of a vector indexing approach that indexes the shapes as a
142+
triangular mesh (see <<geoshape-indexing-approach>>).
113143

114144
Multiple PrefixTree implementations are provided:
115145

@@ -131,9 +161,10 @@ number of levels for the quad trees in Elasticsearch is 29; the default is 21.
131161
[[spatial-strategy]]
132162
[float]
133163
===== Spatial strategies
134-
The PrefixTree implementations rely on a SpatialStrategy for decomposing
135-
the provided Shape(s) into approximated grid squares. Each strategy answers
136-
the following:
164+
deprecated[6.6, PrefixTrees no longer used] The indexing implementation
165+
selected relies on a SpatialStrategy for choosing how to decompose the shapes
166+
(either as grid squares or a tessellated triangular mesh). Each strategy
167+
answers the following:
137168

138169
* What type of Shapes can be indexed?
139170
* What types of Query Operations and Shapes can be used?
@@ -146,21 +177,21 @@ are provided:
146177
|=======================================================================
147178
|Strategy |Supported Shapes |Supported Queries |Multiple Shapes
148179

149-
|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
180+
|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
150181
|`term` |<<point, Points>> |`INTERSECTS` |Yes
151182

152183
|=======================================================================
153184

154185
[float]
155186
===== Accuracy
156187

157-
Geo_shape does not provide 100% accuracy and depending on how it is configured
158-
it may return some false positives for `INTERSECTS`, `WITHIN` and `CONTAINS`
159-
queries, and some false negatives for `DISJOINT` queries. To mitigate this, it
160-
is important to select an appropriate value for the tree_levels parameter and
161-
to adjust expectations accordingly. For example, a point may be near the border
162-
of a particular grid cell and may thus not match a query that only matches the
163-
cell right next to it -- even though the shape is very close to the point.
188+
`Recursive` and `Term` strategies do not provide 100% accuracy and depending on
189+
how they are configured it may return some false positives for `INTERSECTS`,
190+
`WITHIN` and `CONTAINS` queries, and some false negatives for `DISJOINT` queries.
191+
To mitigate this, it is important to select an appropriate value for the tree_levels
192+
parameter and to adjust expectations accordingly. For example, a point may be near
193+
the border of a particular grid cell and may thus not match a query that only matches
194+
the cell right next to it -- even though the shape is very close to the point.
164195

165196
[float]
166197
===== Example
@@ -173,9 +204,7 @@ PUT /example
173204
"doc": {
174205
"properties": {
175206
"location": {
176-
"type": "geo_shape",
177-
"tree": "quadtree",
178-
"precision": "100m"
207+
"type": "geo_shape"
179208
}
180209
}
181210
}
@@ -185,22 +214,23 @@ PUT /example
185214
// CONSOLE
186215
// TESTSETUP
187216

188-
This mapping maps the location field to the geo_shape type using the
189-
quad_tree implementation and a precision of 100m. Elasticsearch translates
190-
this into a tree_levels setting of 20.
217+
This mapping definition maps the location field to the geo_shape
218+
type using the default vector implementation. It provides
219+
approximately 1e-7 decimal degree precision.
191220

192221
[float]
193-
===== Performance considerations
222+
===== Performance considerations with Prefix Trees
194223

195-
Elasticsearch uses the paths in the prefix tree as terms in the index
196-
and in queries. The higher the level is (and thus the precision), the
197-
more terms are generated. Of course, calculating the terms, keeping them in
224+
deprecated[6.6, PrefixTrees no longer used] With prefix trees,
225+
Elasticsearch uses the paths in the tree as terms in the inverted index
226+
and in queries. The higher the level (and thus the precision), the more
227+
terms are generated. Of course, calculating the terms, keeping them in
198228
memory, and storing them on disk all have a price. Especially with higher
199-
tree levels, indices can become extremely large even with a modest
200-
amount of data. Additionally, the size of the features also matters.
201-
Big, complex polygons can take up a lot of space at higher tree levels.
202-
Which setting is right depends on the use case. Generally one trades off
203-
accuracy against index size and query performance.
229+
tree levels, indices can become extremely large even with a modest amount
230+
of data. Additionally, the size of the features also matters. Big, complex
231+
polygons can take up a lot of space at higher tree levels. Which setting
232+
is right depends on the use case. Generally one trades off accuracy against
233+
index size and query performance.
204234

205235
The defaults in Elasticsearch for both implementations are a compromise
206236
between index size and a reasonable level of precision of 50m at the
@@ -598,7 +628,10 @@ POST /example/doc
598628
===== Circle
599629

600630
Elasticsearch supports a `circle` type, which consists of a center
601-
point with a radius:
631+
point with a radius. Note that this circle representation can only
632+
be indexed when using the `recursive` Prefix Tree strategy. For
633+
the default <<geoshape-indexing-approach>> circles should be approximated using
634+
a `POLYGON`.
602635

603636
[source,js]
604637
--------------------------------------------------
@@ -612,6 +645,7 @@ POST /example/doc
612645
}
613646
--------------------------------------------------
614647
// CONSOLE
648+
// TEST[skip:not supported in default]
615649

616650
Note: The inner `radius` field is required. If not specified, then
617651
the units of the `radius` will default to `METERS`.

docs/reference/migration/migrate_7_0/mappings.asciidoc

+16
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,19 @@ as a better alternative.
5252

5353
An error will now be thrown when unknown configuration options are provided
5454
to similarities. Such unknown parameters were ignored before.
55+
56+
[float]
57+
==== deprecated `geo_shape` Prefix Tree indexing
58+
59+
`geo_shape` types now default to using a vector indexing approach based on Lucene's new
60+
`LatLonShape` field type. This indexes shapes as a triangular mesh instead of decomposing
61+
them into individual grid cells. To index using legacy prefix trees `recursive` or `term`
62+
strategy must be explicitly defined. Note that these strategies are now deprecated and will
63+
be removed in a future version.
64+
65+
[float]
66+
==== deprecated `geo_shape` parameters
67+
68+
The following type parameters are deprecated for the `geo_shape` field type: `tree`,
69+
`precision`, `tree_levels`, `distance_error_pct`, `points_only`, and `strategy`. They
70+
will be removed in a future version.

docs/reference/query-dsl/geo-shape-query.asciidoc

+3-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Requires the <<geo-shape,`geo_shape` Mapping>>.
77

88
The `geo_shape` query uses the same grid square representation as the
99
`geo_shape` mapping to find documents that have a shape that intersects
10-
with the query shape. It will also use the same PrefixTree configuration
10+
with the query shape. It will also use the same Prefix Tree configuration
1111
as defined for the field mapping.
1212

1313
The query supports two ways of defining the query shape, either by
@@ -157,7 +157,8 @@ has nothing in common with the query geometry.
157157
* `WITHIN` - Return all documents whose `geo_shape` field
158158
is within the query geometry.
159159
* `CONTAINS` - Return all documents whose `geo_shape` field
160-
contains the query geometry.
160+
contains the query geometry. Note: this is only supported using the
161+
`recursive` Prefix Tree Strategy deprecated[6.6]
161162

162163
[float]
163164
==== Ignore Unmapped

server/src/main/java/org/elasticsearch/common/geo/ShapeRelation.java

+12
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
package org.elasticsearch.common.geo;
2121

22+
import org.apache.lucene.document.LatLonShape.QueryRelation;
2223
import org.elasticsearch.common.io.stream.StreamInput;
2324
import org.elasticsearch.common.io.stream.StreamOutput;
2425
import org.elasticsearch.common.io.stream.Writeable;
@@ -62,6 +63,17 @@ public static ShapeRelation getRelationByName(String name) {
6263
return null;
6364
}
6465

66+
/** Maps ShapeRelation to Lucene's LatLonShapeRelation */
67+
public QueryRelation getLuceneRelation() {
68+
switch (this) {
69+
case INTERSECTS: return QueryRelation.INTERSECTS;
70+
case DISJOINT: return QueryRelation.DISJOINT;
71+
case WITHIN: return QueryRelation.WITHIN;
72+
default:
73+
throw new IllegalArgumentException("ShapeRelation [" + this + "] not supported");
74+
}
75+
}
76+
6577
public String getRelationName() {
6678
return relationName;
6779
}

server/src/main/java/org/elasticsearch/common/geo/builders/GeometryCollectionBuilder.java

-3
Original file line numberDiff line numberDiff line change
@@ -197,9 +197,6 @@ public Object buildLucene() {
197197
}
198198
}
199199

200-
if (shapes.size() == 1) {
201-
return shapes.get(0);
202-
}
203200
return shapes.toArray(new Object[shapes.size()]);
204201
}
205202

0 commit comments

Comments
 (0)