Skip to content

date_index_name pipeline week calc issue #57128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
godlockin opened this issue May 26, 2020 · 5 comments
Closed

date_index_name pipeline week calc issue #57128

godlockin opened this issue May 26, 2020 · 5 comments
Labels
>bug :Core/Infra/Core Core issues without another label

Comments

@godlockin
Copy link

ES: 7.6
OS: Mac/Centos7

issue describe:
I need to auto-create indices weekly, so i take the date_index_name pipeline with index template to do that, but I found the week number calc seems to be different in [_ingest/pipeline/_simulate] with the data post directly.

the reproduce steps and related config as follow:

  1. post the pipelines to do set the timestamp and do index name calc
  2. post the template to do dynamic data indexing
  3. post some test data in [_ingest/pipeline/_simulate]
  4. post the same data in ES
  5. check the index ES created and test data
  1. timestamp setting pipeline, i did some date zone and format change
POST _ingest/pipeline/base_pipeline
{
    "description" : "basic default pipeline, set timestamps and delflg",
    "processors" : [
      {
        "set" : {
          "if" : "'' == ctx.dataCreateTimestamp || null == ctx.dataCreateTimestamp",
          "value" : "{{_ingest.timestamp}}",
          "field" : "dataCreateTimestamp"
        }
      },
      {
        "set" : {
          "value" : "{{_ingest.timestamp}}",
          "field" : "dataUpdateTimestamp"
        }
      },
      {
        "set" : {
          "if" : "'' == ctx.delFlg || null == ctx.delFlg",
          "value" : 0,
          "field" : "delFlg"
        }
      },
      {
        "script" : {
          "lang" : "painless",
          "source" : """String tmpTs = ctx.dataCreateTimestamp; if (0 > tmpTs.indexOf('.')) {return;} ZonedDateTime orgTime = ZonedDateTime.parse(tmpTs.substring(0, tmpTs.indexOf('.')), DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss").withZone(ZoneId.of("UTC")));ZonedDateTime shTime = orgTime.withZoneSameInstant(ZoneId.of("Asia/Shanghai"));ctx.dataCreateTimestamp = shTime.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));"""
        }
      },
      {
        "script" : {
          "lang" : "painless",
          "source" : """String tmpTs = ctx.dataUpdateTimestamp; ZonedDateTime orgTime = ZonedDateTime.parse(tmpTs.substring(0, tmpTs.indexOf('.')), DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss").withZone(ZoneId.of("UTC")));ZonedDateTime shTime = orgTime.withZoneSameInstant(ZoneId.of("Asia/Shanghai"));ctx.dataUpdateTimestamp = shTime.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));"""
        }
      }
    ]
  }
  1. index name pipeline, i need to set the weekly index here
POST _ingest/pipeline/ms_ccy_weekly_details_pipeline
{
    "description" : "ms ccy daily pipeline",
    "processors" : [
      {
        "pipeline" : {
          "name" : "base_pipeline"
        }
      },
      {
        "date_index_name" : {
          "date_rounding" : "w",
          "date_formats" : [
            "yyyy-MM-dd HH:mm:ss",
            "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
            "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
            "yyyy-MM-dd'T'HH:mm:ssZ",
            "yyyy-MM-dd",
            "epoch_second",
            "date_time",
            "basic_date_time",
            "strict_date_time",
            "epoch_millis"
          ],
          "field" : "dataCreateTimestamp",
          "index_name_format" : "yyyy'w'ww",
          "index_name_prefix" : "ms_ccy_weekly_details_",
          "timezone" : "Asia/Shanghai"
        }
      }
    ]
  }
  1. do some simulate for these pipelines
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "pipeline": {
          "name": "ms_ccy_weekly_details_pipeline"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "source": "sina",
        "volatility": 0,
        "button": -999,
        "lastClose": 7.1368,
        "offer": 0,
        "delFlg": 0,
        "fluctuationValue": 0,
        "amplitude": 0,
        "top": 7.1368,
        "ccy": "usdcny",
        "bid": 0,
        "fresh": 0,
        "fluctuation": 0,
        "thisOpen": 7.1368,
        "timeRange": "20200525_20200525"
      }
    }
  ]
}

Response: looks ok

{
  "docs" : [
    {
      "doc" : {
        "_index" : "<ms_ccy_weekly_details_{2020w22||/w{yyyy'w'ww|Asia/Shanghai}}>",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "dataCreateTimestamp" : "2020-05-26 11:39:52",
          "dataUpdateTimestamp" : "2020-05-26 11:39:52",
          "source" : "sina",
          "volatility" : 0,
          "button" : -999,
          "lastClose" : 7.1368,
          "offer" : 0,
          "delFlg" : 0,
          "fluctuationValue" : 0,
          "amplitude" : 0,
          "top" : 7.1368,
          "ccy" : "usdcny",
          "bid" : 0,
          "fresh" : 0,
          "fluctuation" : 0,
          "thisOpen" : 7.1368,
          "timeRange" : "20200525_20200525"
        },
        "_ingest" : {
          "timestamp" : "2020-05-26T03:39:52.126647Z"
        }
      }
    }
  ]
}
  1. the index template to ensure the auto indexing
POST _template/ms_ccy_weekly_details_template
{
    "order" : 0,
    "index_patterns" : [
      "ms_ccy_weekly_details*"
    ],
    "settings" : {
      "index" : {
        "default_pipeline" : "ms_ccy_weekly_details_pipeline",
        "refresh_interval" : "30s"
      }
    },
    "mappings" : { 
.....
     },
    "aliases" : {
      "ms_ccy_weekly_details_current_reader" : { }
    }
  }
}
  1. POST the test data into ES directly and check
POST ms_ccy_weekly_details/_doc
{
  "source": "sina",
  "volatility": 0,
  "button": -999,
  "lastClose": 7.1368,
  "offer": 0,
  "delFlg": 0,
  "fluctuationValue": 0,
  "amplitude": 0,
  "top": 7.1368,
  "ccy": "usdcny",
  "bid": 0,
  "fresh": 0,
  "fluctuation": 0,
  "thisOpen": 7.1368,
  "timeRange": "20200525_20200525"
}

Response: OOPS, the index name is wrong

{
  "_index" : "ms_ccy_weekly_details_2019w01",
  "_type" : "_doc",
  "_id" : "mhITT3IBsi-Nnk2HDR2i",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 20,
  "_primary_term" : 1
}
  1. check the data we just post, the data is correct but index is incorrect
GET ms_ccy_weekly_details_2019w01/_doc/mhITT3IBsi-Nnk2HDR2i
{
  "_index" : "ms_ccy_weekly_details_2019w01",
  "_type" : "_doc",
  "_id" : "mhITT3IBsi-Nnk2HDR2i",
  "_version" : 1,
  "_seq_no" : 20,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "dataCreateTimestamp" : "2020-05-26 11:42:28",
    "dataUpdateTimestamp" : "2020-05-26 11:42:28",
    "source" : "sina",
    "volatility" : 0,
    "button" : -999,
    "lastClose" : 7.1368,
    "offer" : 0,
    "delFlg" : 0,
    "fluctuationValue" : 0,
    "amplitude" : 0,
    "top" : 7.1368,
    "ccy" : "usdcny",
    "bid" : 0,
    "fresh" : 0,
    "fluctuation" : 0,
    "thisOpen" : 7.1368,
    "timeRange" : "20200525_20200525"
  }
}

@godlockin godlockin added >bug needs:triage Requires assignment of a team area label labels May 26, 2020
@nik9000 nik9000 added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label May 29, 2020
@nik9000 nik9000 removed Team:Data Management Meta label for data/management team needs:triage Requires assignment of a team area label labels May 29, 2020
@pgomulka pgomulka self-assigned this Jun 9, 2020
@pgomulka
Copy link
Contributor

pgomulka commented Jun 10, 2020

@godlockin thank you for raising this.
In short - the problem you are seeing is related to java.time implementation treating y as year of era. You should be using Y which is considered week based year. https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
This java.time implementation detail, but it is able to format the date with incorrect yyyy-'w'-ww but it is not able to parse a date like that. Should be YYYY-'w'-ww

EDIT However there is a bug in 7.6 that is currently preventing using the right format.

Consider upgrading to 7.7.. Another option would be to modify your jvm options, and add a line 9-:-Djava.locale.providers=SPI,COMPAT.

Consequently you see different results of _simulate api and ingest - because of the java.time being able to format the date with incorrect format yyyy-'w'-ww but is not able to correctly parse it.
Your response for simulate is "_index" : "<ms_ccy_weekly_details_{2020w22||/w{yyyy'w'ww|Asia/Shanghai}}>", which is just a dynamic index pattern.
When you try to use it, you will noticed that it will resolve to incorrect index name. (again because what we ask here is to parse 2020w22 with yyyy'w'ww) https://www.elastic.co/guide/en/elasticsearch/reference/current/date-index-name-processor.html

curl --request POST \
  --url http://localhost:9200/%3Cms_ccy_weekly_details_%7B2020w22%7C%7C%2Fw%7Byyyy%27w%27ww%7CAsia%2FShanghai%7D%7D%3E/_doc/ \
  --header 'authorization: Basic ZWxhc3RpYzpwYXNzd29yZA==' \
  --header 'content-type: application/json' \
  --data '{
  }'

results with

{
  "_index": "ms_ccy_weekly_details_2019w01",
  "_type": "_doc",
  "_id": "9CXlnXIBgpiz29aVywzS",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

@pgomulka pgomulka added :Core/Infra/Core Core issues without another label and removed >bug labels Jun 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 10, 2020
@pgomulka pgomulka removed :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Core/Infra Meta label for core/infra team labels Jun 10, 2020
@pgomulka
Copy link
Contributor

@godlockin I have update my comment above few times. sorry for trouble, please reread.

@godlockin
Copy link
Author

@pgomulka thanks for the answer, the solution works below.

Another option would be to modify your jvm options, and add a line 9-:-Djava.locale.providers=SPI,COMPAT

while I tried to change the format into [YYYY'w'ww] instead of [yyyy'w'ww], but the follow error will be raised.

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "failed to parse date field [2020w24] with format [YYYY'w'ww]: [temporal accessor [{WeekBasedYear[WeekFields[SUNDAY,1]]=2020, WeekOfWeekBasedYear[WeekFields[SUNDAY,1]]=24},ISO,Asia/Shanghai] cannot be converted to zoned date time]"
      }
    ],
    "type": "parse_exception",
    "reason": "failed to parse date field [2020w24] with format [YYYY'w'ww]: [temporal accessor [{WeekBasedYear[WeekFields[SUNDAY,1]]=2020, WeekOfWeekBasedYear[WeekFields[SUNDAY,1]]=24},ISO,Asia/Shanghai] cannot be converted to zoned date time]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "temporal accessor [{WeekBasedYear[WeekFields[SUNDAY,1]]=2020, WeekOfWeekBasedYear[WeekFields[SUNDAY,1]]=24},ISO,Asia/Shanghai] cannot be converted to zoned date time"
    }
  },
  "status": 400
}

Anyway, your suggestion is quite helpful, many thanks again

@pgomulka pgomulka added the >bug label Jun 15, 2020
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Jun 15, 2020
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Jun 17, 2020
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Jun 17, 2020
…58099)

relates elastic#57128
# Conflicts:
#	docs/reference/release-notes/7.6.asciidoc
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Jun 17, 2020
…58099)

relates elastic#57128
# Conflicts:
#	docs/reference/release-notes/7.6.asciidoc
pgomulka added a commit that referenced this issue Jun 17, 2020
…58227)

relates #57128
# Conflicts:
#	docs/reference/release-notes/7.6.asciidoc
pgomulka added a commit that referenced this issue Jun 17, 2020
…58225)

relates #57128
# Conflicts:
#	docs/reference/release-notes/7.6.asciidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label
Projects
None yet
Development

No branches or pull requests

4 participants