Skip to content

Add bounds support for geogrid agg on shapes #51973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 2, 2020

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Feb 6, 2020

This commit adds support for geo-grid aggregations with bounds parameter
on geo_shape doc values

This also modifies the existing geo_point handling of bounds to be consistent with shapes.

This PR cleans up some aspects of GeoShapeCellValues
to support the specialization of bounded geo_shape
geo-grid aggregations.
@talevy talevy added WIP :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Feb 6, 2020
@talevy talevy requested review from nknize and iverase February 6, 2020 02:23
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@talevy
Copy link
Contributor Author

talevy commented Feb 6, 2020

This PR is still a WIP because its tests exposed existing bugs in the TriangleTreeReader#relate logic. This logic is well tested in TriangleTreeTests#testRectangleShape, but the assertions there are not necessarily what we want when evaluating tiles.

The picture below depicts the unwanted behavior:

Screen Shot 2020-02-05 at 4 31 28 PM

These edge tiles should not be accounted for, outer edges in the triangle tree that are collinear with the queried tiles should not be considered CROSSES in these cases if these are not to be included.

The example above is actually solved in the non-dateline-wrapping case using a hack in the GeoGridTiler, but that hack is not useful for when the shape's bounds crosses the dateline.

@talevy talevy removed the WIP label Feb 13, 2020
@talevy talevy requested review from iverase and nknize and removed request for nknize and iverase February 13, 2020 20:36
@talevy talevy marked this pull request as ready for review February 13, 2020 22:59
@talevy
Copy link
Contributor Author

talevy commented Feb 20, 2020

run elasticsearch-ci/packaging-sample-matrix-unix

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest issue at the moment in the PR is that we need to filter tiles when we collect them from fully contained tiles. This makes me wonder that this should be different between the bounded and unbounded case, what do you think?

@iverase
Copy link
Contributor

iverase commented Feb 20, 2020

I am not sure if the current approach is the most efficient. We are wrapping the bounds around the geo value, I think it would be more efficient to wrap it around the GridTiler? the tiler can call an abstract method to check if the tile is in bounds. For the unbounded case the answer will always be true, for the bounder, it will perform the check against the provided bounds.

This refactor reverts some of the BoundedCellValues
constructs. Instead, BoundedGeoTileGridTiler and
BoundedGeoHashGridTiler are introduced.

As part of this change, the definition/semantics of
geo_grid aggs with bounds on geo_point are modified
to match the same behavior as geo_shapes, where it is
the tile of the point that must intersect the bounds
in order for the point to be accounted for
@talevy talevy requested a review from iverase February 26, 2020 02:18
@talevy
Copy link
Contributor Author

talevy commented Feb 26, 2020

I am still thinking of how to best reduce the duplicate code around the GeoBoundingBox parts of both Geohash and Geotile, but let me know if the update is what you had in mind! thanks Ignacio

@iverase
Copy link
Contributor

iverase commented Feb 26, 2020

@tal, yes, I think in this way the unbounded case behaves the same as before. Thanks!

@talevy
Copy link
Contributor Author

talevy commented Feb 26, 2020

I am not sure there is a much better way to re-use the bounded logic that is less verbose. I think it may only be worth re-visiting once an additional geogrid agg is every introduced.

so this PR is ready for a final pass if you'd like. thanks @iverase!

@talevy talevy requested a review from iverase February 26, 2020 16:41
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@talevy
Copy link
Contributor Author

talevy commented Mar 2, 2020

thanks!

@talevy talevy merged commit b136fa7 into elastic:geoshape-doc-values Mar 2, 2020
@talevy talevy deleted the gdv-shape-grid-bounds branch March 2, 2020 16:39
talevy added a commit that referenced this pull request Mar 2, 2020
This PR cleans up some aspects of GeoShapeCellValues
to support the specialization of bounded geo_shape
geo-grid aggregations.

This refactor reverts some of the BoundedCellValues
constructs. Instead, BoundedGeoTileGridTiler and
BoundedGeoHashGridTiler are introduced.

As part of this change, the definition/semantics of
geo_grid aggs with bounds on geo_point are modified
to match the same behavior as geo_shapes, where it is
the tile of the point that must intersect the bounds
in order for the point to be accounted for
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants