Skip to content

Reindex automatic type conversion failure #38707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
synFK opened this issue Feb 11, 2019 · 7 comments
Closed

Reindex automatic type conversion failure #38707

synFK opened this issue Feb 11, 2019 · 7 comments
Assignees
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down

Comments

@synFK
Copy link

synFK commented Feb 11, 2019

Elasticsearch version: 6.6.0

Plugins installed: []

JVM version: 1.8.0_191

OS version: Linux 4.4.0-116-generic #140-Ubuntu SMP x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
When using the Reindex API and an explicit mapping on the destination index the automatic type conversion fails.
(Remote) Source cluster runs Elasticsearch 2.4.5. Destination is 6.6.0. While the source index has a field named "result.user.id" of type string (analyzed) with a subfield "raw" of type string (not_analyzed), the destination index has a mapping with type long for this field. It also has the option dynamic set to strict. After reindexing a set of documents, the type of this field in the destination index is string while the mapping explicitly states a long. No exceptions occured while reindexing.
For testing purposes I add another field of type long to the destination index mapping called "id2" which I set to a constant string value "12345" with the aid of a little painless script. This string value is successfully auto converted to 12345 during reindexing.

Steps to reproduce:

  1. Source mapping:
"searches": {
  "properties": {
    "result": {
      "properties": {
        "user": {
          "properties": {
            "id": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "ignore_above": 256,
                  "index": "not_analyzed"
                }
              },
              "norms": {
                "enabled": false
              }
            }
          }
        }
      }
    }
  }
}
  1. Destination template:
{
  "order": 0,
  "index_patterns": [ "the_index-*" ],
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "refresh_interval": "5s",
    "index.lifecycle.name": "default_lifecycle_policy",
    "index.lifecycle.rollover_alias": "new_searches"
  },
  "mappings": {
    "_doc": {
      "properties": {
        "result": {
          "properties": {
            "user": {
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            }
          }
        }
      }
    }
  }
}
  1. reindex body:
{
  "size": 100
  "source": {
    "remote": {
      "host": "<source_host>"
    },
    "index": "old_searches",
    "type": "searches"
  },
  "dest": {
    "index": "new_searches",
    "type": "_doc"
  }
}
  1. create index: curl -X PUT <dest_host>/the_index-000001 -H "Content-Type: application/json"
  2. reindex: curl -X POST <dest_host>/_reindex -H "Content-Type: application/json" -d "$reindex_body"

Logs (not relevant):

{
  "took" : 1082,
  "timed_out" : false,
  "total" : 100,
  "updated" : 0,
  "created" : 100,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@matriv matriv added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Analytics/Graph labels Feb 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@synFK
Copy link
Author

synFK commented Feb 12, 2019

By the way this issue applies also for boolean and all other numeric types.

@henningandersen henningandersen added :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down and removed :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Apr 12, 2019
@henningandersen henningandersen self-assigned this May 1, 2019
@henningandersen
Copy link
Contributor

henningandersen commented May 1, 2019

@synFK , thanks for your interest in elasticsearch.

I have tried to reproduce this, but have not been successful in doing so. The type of a field generally cannot be modified once the mapping has been established so this should not be possible. It could be that the behaviour will not manifest itself except under specific conditions.

I would like to get a little closer to the actual steps you took. My best guess at this time is that the_index-000001 was created before the template or at least that something prevented the template from being applied.

The steps I did are given below (notice that these instructions will permanently delete the indices used in the example and that I used that names you supplied so be careful where you run this). Notice that I did a reindex from localhost for simplicity (I find it very unlikely to be something inflicted by the remote source).

DELETE old_searches

DELETE _template/template_1

DELETE the_index-000001

PUT old_searches
{
  "mappings" : {
  "_doc" : {
   "properties": {
    "result": {
      "properties": {
        "user": {
          "properties": {
            "id": {
              "type": "text",
              "fields": {
                "raw": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "norms": false
            }
          }
        }
      }
    }
  }
}
}
}

PUT old_searches/_doc/1
{
  "result" : { "user": { "id" : "12345" } }
}

PUT _template/template_1
{
  "order": 0,
  "index_patterns": [ "the_index-*" ],
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "refresh_interval": "5s",
    "index.lifecycle.name": "default_lifecycle_policy",
    "index.lifecycle.rollover_alias": "new_searches"
  },
  "mappings": {
  "_doc" : {
      "properties": {
        "result": {
          "properties": {
            "user": {
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            }
          }
        }
      }
    }
}
}

PUT the_index-000001

POST _reindex
{
  "size": 100,
  "source": {
    "remote": {
      "host": "http://localhost:9200"
    },
    "index": "old_searches"
  },
  "dest": {
    "index": "new_searches"
  }
}

Afterwards, the mapping was defined like this, ie. type long as expected:

GET the_index-000001
{"the_index-000001":{"aliases":{},"mappings":{"_doc":{"properties":{"result":{"properties":{"user":{"properties":{"id":{"type":"long"}}}}}}}},"settings":{"index":{"lifecycle":{"name":"default_lifecycle_policy","rollover_alias":"new_searches"},"refresh_interval":"5s","number_of_shards":"1","provided_name":"the_index-000001","creation_date":"1556714472477","number_of_replicas":"0","uuid":"V1nr5CgLTzCoXt5wDtAUkw","version":{"created":"6060299"}}}}}

@henningandersen
Copy link
Contributor

@synFK given that we have not heard back from you on this, I will close this issue. Please reopen with clarification if you think there is still something to improve here.

@synFK
Copy link
Author

synFK commented Jul 2, 2019

Hi @henningandersen,
the issue still exists, but I guess it is a cosmetic one. Of course the mapping shows us a field of type long. But when you request the actual data the reindexed string value is still double quoted, whereas the hard coded and thus successfully autoconverted value is not.

@henningandersen
Copy link
Contributor

henningandersen commented Jul 2, 2019

@synFK thanks for clarifying. This is the way it works, also if you simply put a document with a string for a long field:

PUT xxx/_doc/1
{
  "data" : "17"
}

gives:

GET xxx/_doc/1
{"_index":"xxx","_type":"_doc","_id":"1","_version":3,"_seq_no":2,"_primary_term":1,"found":true,"_source":{
  "data" : "17"
}
}

whereas:

PUT xxx/_doc/1
{
  "data" : 17
}

results in:

{"_index":"xxx","_type":"_doc","_id":"1","_version":2,"_seq_no":1,"_primary_term":1,"found":true,"_source":{
  "data" : 17
}
}

I suspect that the auto-conversion with painless happens in the script, not during reindex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down
Projects
None yet
Development

No branches or pull requests

4 participants