Skip to content

geo_shapes stopped indexing in 6.4.1 #34047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Destroy666x opened this issue Sep 25, 2018 · 15 comments
Closed

geo_shapes stopped indexing in 6.4.1 #34047

Destroy666x opened this issue Sep 25, 2018 · 15 comments
Assignees
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug

Comments

@Destroy666x
Copy link

Destroy666x commented Sep 25, 2018

Elasticsearch version (bin/elasticsearch --version): 6.4.1

Plugins installed: []

JVM version (java -version): official Docker image

OS version (uname -a if on a Unix-like system): official Docker image

Description of the problem including expected versus actual behavior:

After upgrading from 6.3.2 to 6.4.1, many geo_shapes stopped importing.

Steps to reproduce:

  1. Index has a field defined like this (FOSElastica bundle YML configuration):
    shape:
        type: 'geo_shape'
        tree: 'quadtree'
        precision: '10m'
        distance_error_pct: 0.001
  1. Indexing many shapes in 6.4.1 that worked in 6.3.2 results in caused failed to parse error.
  2. Adding ignore_malformed: true doesn't seem to help.

It looks like there was some sort of undocumented BC break regardless of the shapes being fully proper. If you want I can provide you an example of a shape.

@Destroy666x Destroy666x changed the title geo_shapes stopped importing in 6.4.1 geo_shapes stopped indexing in 6.4.1 Sep 25, 2018
@tlrx
Copy link
Member

tlrx commented Sep 25, 2018

Thanks for reporting.

It looks like there was some sort of undocumented BC break regardless of the shapes being fully proper. If you want I can provide you an example of a shape.

Yes, that would help if you could provide an example of a shape that is correctly indexed in 6.3.2 and failed to be indexed in 6.4.1. We'll need the mappings too.

@tlrx tlrx added feedback_needed :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Sep 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@Destroy666x
Copy link
Author

We'll need the mappings too.

Which ones? I provided the ones for the affected field, do you need any other?

@tlrx
Copy link
Member

tlrx commented Sep 26, 2018

@Destroy666x Thanks! If you can provide a full example with all the document fields mappings that's great. If you can reproduce with one field only then we only need the field mapping :)

@Destroy666x
Copy link
Author

Destroy666x commented Sep 30, 2018

Question @tlrx - is there an easy way to check the bulk request the library does in some sort of Elastic logs? Because the library logger for some reason skips it, even if I simplify the request quite a lot.

Maybe these details will help in the meanwhile:

  1. There's e.g. this shape stored as a multidimensional array https://pastebin.com/ULfUwAZJ
  2. The code turns it into
'type' => 'geometrycollection',
'geometries' => []

where geometries is an array of polygons (3rd level) with flipped coordinate order (which I guess could be replaced with orientation change) and lat/lon order (4th level) before bulk insertion.
3. I can reproduce it with only this field mapped.
4. It also worked in earlier versions, e.g. 5.x.

@imotov imotov self-assigned this Oct 2, 2018
@imotov
Copy link
Contributor

imotov commented Oct 2, 2018

@Destroy666x I tried to reproduce what you described based on the information that you have provided on 6.3.1 and 6.4.1 and got quite an opposite result. The following script https://gist.github.com/imotov/64dcfc84829141d54d4a244573902449 that uses both [] as geometies as well as your shape doesn't return any errors on 6.4.1 but fails with an error message on 6.3.1. So, in this test 6.4.1 seems to be more robust than 6.3.1 in ignoring malformed objects. Could you modify this example to reproduce the behaviour that you describe? Thanks!

@Destroy666x
Copy link
Author

Destroy666x commented Oct 3, 2018

Well, you didn't reverse the order of vertices and lat/lon and didn't do it as a polygon array. I'll try your way of sending the data and then will submit a more relevant gist.

@Destroy666x
Copy link
Author

Destroy666x commented Oct 9, 2018

@imotov I tried your PUTs directly in Kibana dev tools and for the 2nd I got shape must be an object consisting of type and coordinates if there is no ignore_malformed: true. Same goes for proper reverse lat/lon + vertices and:

    "type": "geometrycollection",
    "geometries": [
      {
        "type": "polygon",
        "coordinates": [
            ...
        ]
      }
    ]

format. Which on the other hand throws geo coordinates must be numbers error.

So I guess the Symfony library or Elastica for some reason ignores the setting. But if that's the case why is the setting suddenly required (BC break still) and what's wrong with the shape (the reverse lat/lon and vertices one that is)? The problem still exists in 6.4.2 BTW.

@Destroy666x
Copy link
Author

Destroy666x commented Oct 10, 2018

Here's the shape with proper default Elasticsearch order: https://pastebin.com/c8xHeXx0

And here's a screen from 6.3.2 with the exact request https://pastebin.com/N0JFt4u2 and no ignore_malformed: true:
1
Your array-only formatted version doesn't work on earlier versions indeed, you get object mapping for [shape.geometries] tried to parse field [null] as object, but found a concrete value.

@imotov
Copy link
Contributor

imotov commented Oct 10, 2018

So I guess the Symfony library or Elastica for some reason ignores the setting.

The setting is only applied on the server side and not on the client side. So, I think the shape that your client app is sending is not what you think/reported it is sending. Perhaps, you can sniff the network traffic to see what is actually getting sent and try to reproduce it using Kibana. That would really help, because at the moment I am not actually sure what the issue is and I still cannot reproduce it.

@Destroy666x
Copy link
Author

Destroy666x commented Oct 10, 2018

The setting is only applied on the server side and not on the client side

Yes, but you can configure the index mappings that are sent to the server.

That would really help, because at the moment I am not actually sure what the issue is and I still cannot reproduce it.

Did you try the PUT from the 2nd link in both 6.3.2 and 6.4.2?

So, I think the shape that your client app is sending is not what you think/reported it is sending. Perhaps, you can sniff the network traffic to see what is actually getting sent and try to reproduce it using Kibana

Ok, I'll try that when I'll find more time.

@imotov
Copy link
Contributor

imotov commented Oct 10, 2018

Did you try the PUT from the 2nd link in both 6.3.2 and 6.4.2?

@Destroy666x the 2nd link from where and with which mapping? Could you post a complete reproduction similar to what I posted in https://gist.github.com/imotov/64dcfc84829141d54d4a244573902449 ?

@Destroy666x
Copy link
Author

From this comment: #34047 (comment) The rest of your gist is pretty much the same, apart from different index, type name and no ignore_malformed: true.

@imotov
Copy link
Contributor

imotov commented Oct 10, 2018

Thanks! I can reproduce it now. It looks like another case similar to #31428. A smaller reproduction:

DELETE test

PUT test
{
  "mappings": {
    "doc": {
      "properties": {
        "shape": {
          "ignore_malformed": true,
          "type": "geo_shape"
        }
      }
    }
  }
}

PUT /test/doc/9999
{
  "shape": {
    "type": "geometrycollection",
    "geometries": [
      {
        "type": "polygon",
        "coordinates": [
          ["46.6022226498514", "24.7237442867977"],
          ["46.6031857243798", "24.722968774929"],
          ["46.6006675445073", "24.7250001432471"],
          ["46.6012343057486", "24.7245313491405"],
          ["46.6020897839358", "24.7238517986582"],
          ["46.6021637530119", "24.7237921384342"],
          ["46.6022226498514", "24.7237442867977"]
        ]
      }
    ]
  }
}

imotov added a commit to imotov/elasticsearch that referenced this issue Oct 15, 2018
Continuation of the work in elastic#31449. Ensures that malformed geoshapes are
reliably ignored if "ignore_malformed" is set to true instead of failing
the entire document by making sure that xcontent parse is left in a
coherent state even if a data format parsing error occurred.

Fixes elastic#34047
imotov added a commit to imotov/elasticsearch that referenced this issue Nov 15, 2018
Adds a method to XContent parser to skip all children of a current
element in case of the parsing failure and applies this method to be
able to ignore the rest of the GeoJson shape if the parsing fails and
we need to ignore the geoshape due to the ignore malformed flag.

Supersedes elastic#34498

Closes elastic#34047
imotov added a commit that referenced this issue Nov 21, 2018
…5603)

Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the 
parsing fails and we need to ignore the geoshape due to the 
ignore_malformed flag.

Supersedes #34498

Closes #34047
imotov added a commit that referenced this issue Nov 22, 2018
…5603)

Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the
parsing fails and we need to ignore the geoshape due to the
ignore_malformed flag.

Supersedes #34498

Closes #34047
@brccabral
Copy link

Just for reference, I solved my issue by using this solution
https://stackoverflow.com/questions/65031965/how-to-remove-quotation-marks-in-geo-coordinates-on-logstash-conf-file

filter{
    mutate {
      add_field => {
        "[location][type]" => "point"
      }
    }
	ruby{
		code => '
			event.set("[location][coordinates]", [event.get("latitude"), event.get("longitude")])
		'
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug
Projects
None yet
Development

No branches or pull requests

5 participants