Skip to content

Commit 38baa36

Browse files
Geometry simplifier (#94859)
Support geometry and streaming simplification There are many opportunities to enable geometry simplification in Elasticsearch, both as an explicit feature available to users, and as an internal optimization technique for reducing memory consumption for complex geometries. For the latter case, it can even be considered a bug fix. This PR provides support for constraining Line and LinearRing sizes to a fixed number of points, and thereby a fixed amount of memory usage. Consider, for example, the geo_line aggregation. This is similar to the top-10 aggregation, but allows the top-10k (ten thousand) points to be aggregated. This is not only a lot of memory, but can still cause unwanted line truncation for very large geometries. Line simplification is a solution to this. It is likely that a much smaller limit than 10k would suffice, while at the same time not truncating the geometry at all, so we fix a bug (truncation) while improving memory usage (pull limit from 10k down to perhaps just 1k). This PR provides two APIs: Streaming: * By using the simplifier.consume(x, y) method on a stream of points, the total memory used is limited to a linear function of k, the total number of points to retain. This algorithm is at its heart based on the Visvalingam–Whyatt algorithm, with concepts from https://bost.ocks.org/mike/simplify/ and in particular the detailed streaming discussions in the paper at https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.7132&rep=rep1&type=pdf Full-geometry: * Simplifying full geometries using the simplifier.simplify(geometry) method can work with most geometry types, even GeometryCollection, but: - Some geometries do not get simplified because it makes no sense to: Point, Circle, Rectangle - The maxPoints parameter is used as is to apply to the main component (shell for polygons, largest geometry for multi-polygons and geometry collections), and all other sub-components (holes in polygons, etc.) are simplified to a scaled down version of the maxPoints, scaled by the relative size of the sub-component to the main component. * The simplification itself is done on each Line and LinearRing component using the same streaming algorithm above. Since we use the Visvalingam–Whyatt algorithm, this works is applicable to both streaming and full-geometry simplification with the same essential result, but better control over memory than normal full-geometry simplifiers. The basic algorithm for simplification on a stream of points requires maintaining two data structures: * an array of all currently simplified points (implicitly ordered in stream order) * a priority queue of all but the two end points with an estimated error on each that expresses the cost of removing that point from the line
1 parent a45370e commit 38baa36

17 files changed

+2710
-0
lines changed
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0 and the Server Side Public License, v 1; you may not use this file except
5+
* in compliance with, at your election, the Elastic License 2.0 or the Server
6+
* Side Public License, v 1.
7+
*/
8+
9+
package org.elasticsearch.benchmark.spatial;
10+
11+
import org.elasticsearch.geometry.LinearRing;
12+
import org.elasticsearch.geometry.simplify.GeometrySimplifier;
13+
import org.elasticsearch.geometry.simplify.SimplificationErrorCalculator;
14+
import org.openjdk.jmh.annotations.Benchmark;
15+
import org.openjdk.jmh.annotations.BenchmarkMode;
16+
import org.openjdk.jmh.annotations.Fork;
17+
import org.openjdk.jmh.annotations.Measurement;
18+
import org.openjdk.jmh.annotations.Mode;
19+
import org.openjdk.jmh.annotations.OutputTimeUnit;
20+
import org.openjdk.jmh.annotations.Param;
21+
import org.openjdk.jmh.annotations.Scope;
22+
import org.openjdk.jmh.annotations.Setup;
23+
import org.openjdk.jmh.annotations.State;
24+
import org.openjdk.jmh.annotations.Warmup;
25+
import org.openjdk.jmh.infra.Blackhole;
26+
27+
import java.io.BufferedReader;
28+
import java.io.FileNotFoundException;
29+
import java.io.IOException;
30+
import java.io.InputStream;
31+
import java.io.InputStreamReader;
32+
import java.nio.charset.StandardCharsets;
33+
import java.text.ParseException;
34+
import java.util.concurrent.TimeUnit;
35+
import java.util.zip.GZIPInputStream;
36+
37+
@Fork(1)
38+
@Warmup(iterations = 5)
39+
@Measurement(iterations = 5)
40+
@BenchmarkMode(Mode.AverageTime)
41+
@OutputTimeUnit(TimeUnit.MILLISECONDS)
42+
@State(Scope.Thread)
43+
public class GeometrySimplificationBenchmark {
44+
@Param({ "cartesiantrianglearea", "triangleArea", "triangleheight", "heightandbackpathdistance" })
45+
public String calculatorName;
46+
47+
@Param({ "10", "100", "1000", "10000", "20000" })
48+
public int maxPoints;
49+
50+
private GeometrySimplifier<LinearRing> simplifier;
51+
private static LinearRing ring;
52+
53+
@Setup
54+
public void setup() throws ParseException, IOException {
55+
SimplificationErrorCalculator calculator = SimplificationErrorCalculator.byName(calculatorName);
56+
this.simplifier = new GeometrySimplifier.LinearRingSimplifier(maxPoints, calculator);
57+
if (ring == null) {
58+
ring = loadRing("us.json.gz");
59+
}
60+
}
61+
62+
@Benchmark
63+
public void simplify(Blackhole bh) {
64+
bh.consume(simplifier.simplify(ring));
65+
}
66+
67+
private static LinearRing loadRing(@SuppressWarnings("SameParameterValue") String name) throws IOException, ParseException {
68+
String json = loadJsonFile(name);
69+
org.apache.lucene.geo.Polygon[] lucenePolygons = org.apache.lucene.geo.Polygon.fromGeoJSON(json);
70+
LinearRing ring = null;
71+
for (org.apache.lucene.geo.Polygon lucenePolygon : lucenePolygons) {
72+
double[] x = lucenePolygon.getPolyLons();
73+
double[] y = lucenePolygon.getPolyLats();
74+
if (ring == null || x.length > ring.length()) {
75+
ring = new LinearRing(x, y);
76+
}
77+
}
78+
return ring;
79+
}
80+
81+
private static String loadJsonFile(String name) throws IOException {
82+
InputStream is = GeometrySimplificationBenchmark.class.getResourceAsStream(name);
83+
if (is == null) {
84+
throw new FileNotFoundException("classpath resource not found: " + name);
85+
}
86+
if (name.endsWith(".gz")) {
87+
is = new GZIPInputStream(is);
88+
}
89+
BufferedReader reader = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8));
90+
StringBuilder builder = new StringBuilder();
91+
reader.lines().forEach(builder::append);
92+
return builder.toString();
93+
}
94+
}
Binary file not shown.

docs/changelog/94859.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 94859
2+
summary: Geometry simplifier
3+
area: Geo
4+
type: feature
5+
issues: []

libs/geo/src/main/java/module-info.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@
99
module org.elasticsearch.geo {
1010
exports org.elasticsearch.geometry;
1111
exports org.elasticsearch.geometry.utils;
12+
exports org.elasticsearch.geometry.simplify;
1213
}

0 commit comments

Comments
 (0)