Skip to content

Geoline aggregation - add simplification option #87903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nickpeihl opened this issue Jun 21, 2022 · 4 comments
Open

Geoline aggregation - add simplification option #87903

nickpeihl opened this issue Jun 21, 2022 · 4 comments
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nickpeihl
Copy link
Member

Description

Asset tracking use cases such as GPS beacons on vehicles can create a lot of geo_points. When constructing a geo_line from a high cardinality set of points, it may be helpful to reduce the result set to only the points necessary to represent the line geometry. Two line simplication algorithms to accomplish this are Ramer–Douglas–Peucker and Visvalingam-Whyatt. Demonstration of both algorithms.

The PR for the Geo Line aggregation suggested having a simplify option to accomplish this, but it was not implemented.

For an example. Let's say we are tracking a single delivery vehicle over a 2000 mile trip. The vehicle is averaging at 45 miles per hour and submits a GPS location every 10 seconds. Over the course of the entire trip the vehicle will send about 16000 locations as geo_points. Currently we can create a geo_line aggregation up to the latest 10000 geo_points. But, since the vehicle spends many minutes traveling in a straight line, we should be able to greatly reduce the number of vertices in the geo_line across the entire 16000 point data set using one of the aforementioned simplification algorithms.

@nickpeihl nickpeihl added >enhancement needs:triage Requires assignment of a team area label labels Jun 21, 2022
@nreese
Copy link
Contributor

nreese commented Jun 21, 2022

Similar to #87710

@iverase iverase added :Analytics/Geo Indexing, search aggregations of geo points and shapes and removed needs:triage Requires assignment of a team area label labels Jun 22, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 22, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@iverase
Copy link
Contributor

iverase commented Jun 22, 2022

Just to make clear that is not possible to implement this feature in our current indexes in a scalable fashion.

Elasticsearch is a distributed system and points from one track can be stored in different shards that can actually be located in different cluster nodes. The only way to apply those simplification algorithms is to send all the data points of a track to the coordinator node and then apply the simplification. This approach is not scalable and it will be very easy to send a request that fills the heap of the coordinator node.

We are hoping to provide this functionality as part of the TSDB on geo_line project. Time series aggregations have the property that data is visited on chronological order for one tsid (aka track) which is exactly what we need to apply simplification while reading the points from a track. Of course, indexes will need to be created as time series indexes.

@craigtaverner
Copy link
Contributor

The time-series version of line-implification in geo_line aggregations has been merged in #94954. This issue can remain as a request to do line-simplification in non-time-series cases. However, as mentioned above, there are memory concerns with this approach. It could, perhaps, make sense if the data nodes threshold was different from the coordinately nodes threshold. For example, right now the data nodes truncate at a very high value of 10000, in order to reduce the risk of damaging results (truncating is damaging), and the coordinating node will truncate to the same threshold, but in fact a simplified line to 1000 points or even less (perhaps even 100) could be sufficient for visualization. So perhaps line-simplification in the coordinating node to a shorter line is of value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

5 participants