-
Notifications
You must be signed in to change notification settings - Fork 25.2k
BKD backed polygon intersection is slow #50531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Geo) |
Thanks @blkbltjns for sharing the numbers. I am curious to know what performance would you get if you perform the same query using an envelope instead of a polygon. Would it be possible for you to run the following query and share the results?:
|
Running the envelope query has the same result in terms of performance. The initial query takes 2+ minutes (with very heavy disk read activity) and subsequent runs of the same query take a little over a second. |
Hi @blkbltjns, It is a while since you originally open the issue but I am wondering if you were able to understand the issue with the disk page cache? There were some improvements in recent versions of ES regarding BKD backed geo shapes so they might have helped you. |
Hi @iverase, No change in performance here. We still see query timeouts on the initial spatial intersection queries, and subsequent queries take around a second to complete. |
are you still using 7.5.0? In 7.6.0 we change the way we open this index(#49272), so I would expect an upgrade will help. In the upcoming 7.9.0, there are several improvements to this index as well. Could you share the output of hot threads while running this query? I would like to see where we are spending most of the time. |
Closing, not enough information to proceed. |
Uh oh!
There was an error while loading. Please reload this page.
Elasticsearch version 7.5.0
Kibana 7.5.0 plugin installed
Windows Server 2016 Datacenter
When moving from 6.3.2 to 7.5.0, spatial intersections on a geo_shape field in an index with 12 million polygons is significantly slower than it was before. This index is about 20GB. All these polygons represent land parcels and commonly touch and/or slightly overlap with neighboring shapes. I am doing the intersection with a bounding box roughly the size of the southern United States.
The first intersection query (see end of this post for example) I do after a fresh ES 7.5.0 server restart takes 2+ minutes. On a fresh ES 6.3.2 server restart this exact same query against the same data (using quadtree geo_shape) takes 800ms. I notice that there is very heavy disk read activity during the 7.5.0 query. This does not happen during the 6.3.2 query. Note that this is only happening with polygon geo_shapes; point geo_shapes do not have this problem from what I can tell.
After this first query (with a hot cache?), the situation improves somewhat but the 7.5.0 query still takes over a second to run while the 6.3.2 query takes 100ms.
Interestingly, doing an ES restart as opposed to a full server restart does not result in the 2+ minute query. I believe this is due to the Windows disk page cache being cleared being responsible for the 2+ minute to 1 second change.
If it matters, here is my query (this is /_count but I get similar results with /_search):
POST myindex/_count
{"query":{"bool":{"filter":{"bool":{"must":[{"geo_shape":{"geography":{"shape":{"type":"polygon","coordinates":[[[-118.74701654704592,38.554294590584455],[-118.74701654704592,22.052177425063828],[-76.559516547045916,22.052177425063828],[-76.559516547045916,38.554294590584455],[-118.74701654704592,38.554294590584455]]]},"relation":"intersects"}}}]}}}}}
The text was updated successfully, but these errors were encountered: