Skip to content

Fix geo normalization #1997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ofavre opened this issue Jun 1, 2012 · 4 comments
Closed

Fix geo normalization #1997

ofavre opened this issue Jun 1, 2012 · 4 comments

Comments

@ofavre
Copy link

ofavre commented Jun 1, 2012

We have 3 problems:

  • The actual latitude normalization is very wrong: it wraps latitude. Coordinate normalization is broken.
  • The range normalization cannot be performed by normalizing the top-left and bottom-right corners.
  • The geo polygon normalization cannot be performed by normalizing its vertices.

Let me explain:

When you cross past the 180° meridian, you arrive near the -180° meridian (which is actually the same / does not exist).
In other words longitude wraps.

When you cross past the 90° parallel at the north pole, you are still in the north pole, but you are now going back down along the opposite meridian.
In other words latitude reflects, accompanied with a 180° shift in longitude.

This implies that we cannot normalize latitude without affecting (or even normalizing) longitude.

Moreover when describing an area you have to split it in multiple areas. This affects geo bounding boxes and geo polygons.
The easiest example is considering the range [(10°;170°);(-10°;190°)], it must be splitted into [(10°;170°);(-10°;180°)] and [(10°;-180°);(-10°;-170°)].

@ofavre
Copy link
Author

ofavre commented Jun 1, 2012

In multiple classes you can control whether or not to normalize and validate latitude and/or longitude.
Controlling this setting separately for latitude and longitude seems odd, considering the implication of a 180°lon shift in some cases.

I can only think of one use case: One wants to superpose an (indefinite and dynamic) number of layers that must not be merged.
One can then decide to apply a k*360°lon shift, so that range queries made in one domain (defined by x+k*360 with x in ]-180:180] for longitude and y in [-90;90] for latitude) do not leak to other layers.
But even in such weird use case, a filter on some field containing k would make more sense... maybe not performance wise, but still.

ofavre pushed a commit to yakaz/elasticsearch that referenced this issue Jun 1, 2012
Geo normalization of latitude was not properly performed.
90°lat is the north pole, and +1°lat beyond is still the north pole,
not the south pole.
Latitude should not be wrapped like longitude but rather reflected,
accompanied with a +180°lon shift.

Normalizing ranges cannot be performed by normalizing its top-left
and bottom-right points, but must be split into multiple ranges.

And two "qeo"->"geo" type fixes as bonuses.

Note: There is still a problem with GeoPolygon, like for ranges,
      but I'll fix this in a separate commit.

See elasticsearch/elasticsearch issue elastic#1997.
@ofavre
Copy link
Author

ofavre commented Jun 1, 2012

Normalizing geo polygons

Keep in mind that a polygon is defined by a list connected of points, defining contiguous edges.
Also, the GeoPolygonFilter.pointInPolygon() algorithm uses the crossing number method, hence a self intersection defines a whole.

The idea

  • A polygon should be split when one of its edges crosses a splitting axis (±90°lat or ±180°lon).
    We must consider the intersection point, normalized on both sides.
  • The points that are on the opposite side of the splitting axis will be taken care of as part of a new polygon.
  • As long as the points considered when iterating are at the opposite side, we ignore them in the context of the current polygon.
  • As soon as they come back in and to not cross an other splitting axis (at corners), we resume filling the current polygon, taking care of the intersection points with the axis.
  • When creating a new polygon, we must attach a new transformation to it, so that the transformed points it considers are normalized, until they, too, eventually cross a splitting axis.
  • In order to close all polygons (especially those started at the middle of the iteration) we must reiterate on the points.
    When the starting point is met again, we close the related polygon, etc. until all polygons are closed.

Note: This algorithm keeps parts of the polygon that are in the same normalization space as part of a single polygon, hence self intersections are kept intact. However this comes with eventual edges being along the splitting axis, sometime multiple such edges being superposed, and this may affect the pointInPolygon() algorithm, but it shouldn't, considering such edges cover an empty area.

Complexity: This algorithm is O(n), n being the number of points defining the polygon to normalize, but it's a bit verbose... (can it really be less?)

Pseudo-implementation stub

main:
    let inPoints be the list of points defining the polygon to normalize
    let list be the list of polygons
    let polygon be composed of
        tfo // a transform the normalizes the points to consider
        points // a list of points that form the polygon
        startIdx // the index of the first point that created this polygon
        closed = false // whether the polygon has finished its construction
        active = true // if the points considered don't yet cross a splitting axis
        cutPoint = none // the cut point on the splitting axis
        splittingAxis = none // the current splitting axis that paused the construction
    let tfo be a transformation
    set tfo = transformation normalizing the point inPoints[0]
    let n be an integer
    set n = 0
    append new polygon(tfo=tfo, points=[inPoints[0]], startIdx=0) to list
    iterate over the edges : edge
        iterate over list : polygon
            polygon.notify(n, edge)
        increment n
    set n = 0
    let i be the index of the first non closed polygon
    set nc = 0
    iterate over the edges : edge
        for i in nc to list.size-1
            list[i].notify(n, edge)
            if list[i].closed?
                increment nc
        if nc equals list.size
            break
        increment n
    return list

polygon.notify(n, edge):
    transform edge according to tfo
    if not active
        if n == startIdx
            set closed = true
        else if edge crossing back splittingAxis
            set cutPoint accordingly
            set edge.begin = cutPoint
            if edge crossing any splitting axis
                set cutPoint accordingly
                set splittingAxis accordingly
            else
                set active = true
                self.notify(n, edge)
    else
        if edge crossing any splitting axis
            set active = false
            set cutPoint accordingly
            set splittingAxis accordingly
            append cutPoint to points
            set edge.begin = cutPoint
            let newTfo be a transformation
            let newPoly be a polygon
            set newTfo accordingly, to normalize points of the other side of splittingAxis
            set newPoly = new polygon(tfo=newTfo, points=[edge.begin], startIdx=n)
            append newPoly to list
            call newPoly.notify(n, edge)
        else
            append edge.end to points

Feedback required

I'm not really going to implement it because: I'm not 100% sure it works fine in all cases, and I'm not convinced geo polygon normalization is a real need.

It may even be prevented by other means if such problem arise.
For instance, with a few assumptions one can generate reflected/shifted copies of the polygon, and or together the results.

@ofavre
Copy link
Author

ofavre commented Jun 1, 2012

Normalizing geo polygons

A naive approach

Why didn't I even think about that one before...

Near up to 3 time slower. No impact if already normalized.

Assumptions:

  • the geo polygon is no wider than 360°lon and no taller than 180°lat (should usually be the case).
  • no undesired point lie outside the normalized range (if the user wants normalization... then it should normally be the case).

Steps:

  • Compute the geo bounding box of the geo polygon
  • Normalize the range, but instead of generating just the missing parts,
    shift and or reflect the whole polygon
  • Or-together the generated copies (3 maximum).

Feedback required

Do you think such assumptions are fair? the performance impact acceptable?

@ofavre
Copy link
Author

ofavre commented Jun 1, 2012

Normalizing geo polygons

Using triangulation

  • Check if the geo bounding box is already normalized and early exit.
  • Triangulate the polygon.
  • Eventually split each triangle
normalize the first point, transforming the two others accordingly.
select the number of adjacent edges of the first point that cross a splitting axis
    if zero
        if the opposite edge crosses a splitting axis
            // the first point is on a splitting axis
            calculate the cut point
            split the triangle in two
            // no need to normalize the two pieces as none of their edge would cross a splitting axis
        // otherwise, the triangle is normalized (as the other two points are within the same normalization space and a triangle is convex!)
    if one
        // the opposite edge is crossing the same splitting axis too
        calculate the cut point
        create a new triangle with the second and third points, and the cut point (all but the first point)
        normalize that new triangle
        modify the current triangle to use the cut point instead of the faulty vertex
        // no need to re-normalize as we've fixed the only crossing of the current triangle
    if two
        calculate the two cut points
        create two triangles with the second and third points, and the two cut points
        normalize them
        modify the current triangle to use the two cut points instead of the second and third points
        // no need to re-normalize as we've fixed the two crossing of the current triangle

Complexity: Triangulation can be no faster than O(n*log(n)) if polygon has holes, O(n) otherwise. Furthermore, re-normalization of split triangles are sometimes needed.

Feedback required

This method seems more conventional and simple.

If not handled by the triangulation method, it however it breaks the crossing number method, by ignoring the self-intersections.
The pointInPolygon method will also be modified/simplified, but not necessarily to become more performant.

chilling added a commit to chilling/elasticsearch that referenced this issue Jul 2, 2013
===============
The code handling geo-shapes is not centralized and creating points takes
place at different places. Also the collection of supported geo_shapes is
not complete regarding to the GEOJSon specification. This commit
centralizes the code related to GEO calculations and extends the old API by
a set of new shapes.

Null-Shapes
===========
The latest implementation of geo-shapes allows to index null-shapes. This
means a field that is defined to hold a geo-shape can be set to null. In
example:
    {

        "shape": null
    }

New Shapes
==========
The geo-shapes multipoint and multilinestring have been added to the
geo_shape types. Also geo_circle is introduced by this commit.

Dateline wrapping
=================
A major issue of geo-shapes is the spherical geometry. Since ElasticSearch
works on the Geo-Coordinates by wrapping the Earths surface to a plane,
some shapes are hard to define if it’s crossing the +180°, -180 longitude.
To solve this issue ElasticSearch offers the possibility to define geo
shapes crossing this borders and decompose these shapes and automatically
re-compose them in a spherical manner. This feature may change the indexed
shape-type. If for example a polygon is defined, that crosses the dateline,
it will be re-assembled to a set of polygons. This causes indexing a
multipolygon. Also linestrings crossing the dateline might be re-assembled
to multilinestrings.

Builders
========
The API has been refactored to use builders instead of using shapes. So
parsing geo-shapes will result in builder objects. These builders can be
parsed and serialized without generating any shapes. this causes shape
generation only on the nodes executing the actual operation. Also the
baseclass ShapeBuilder implements the ToXContent interface which allows to
set fields of XContent directly.

TODO’s
======
 - The geo-circle will not work, if it’s crossing the dateline
 - The envelope also needs to wrapped

Closes elastic#1997 elastic#2708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant