Skip to content

[ML] Data frame silently fails to index if aggs return a string #43194

Closed
@sophiec20

Description

@sophiec20

Found in 7.2.0-BC5

The following data frame preview contains a typo in responsetime. Composite aggs returns avg(resposetime) as a string "NaN" .

GET _data_frame/transforms/_preview
{
  "id": "farequote-airline-100",
  "source" : { "index": [ "farequote-*" ] },
  "dest"     : {  "index": "df-farequote-airline-100" },
  "pivot": {
    "group_by": {
      "airline": { "terms": { "field": "airline" }}
    },
    "aggregations": {
      "last_updated": { "max": { "field": "@timestamp" } },
      "mean_response": { "avg": {  "field": "resposetime" } },
      "count": { "value_count": { "field": "airline" } }
    }
  }
}

returns

{
  "preview" : [
    {
      "last_updated" : "2019-02-11T23:59:02.000Z",
      "count" : 8728.0,
      "mean_response" : "NaN",
      "airline" : "AAL"
    },
...

The data frame silently fails to index these values resulting in an entirely empty dest index.

However there are no errors in elasticsearch logs. There are no warnings or errors in data frame notifications, and the stats make it appear that the data frame transform was successful.

{
  "count" : 1,
  "transforms" : [
    {
      "id" : "farequote-airline-100",
      "state" : {
        "task_state" : "stopped",
        "indexer_state" : "stopped",
        "checkpoint" : 1,
        "progress" : {
          "total_docs" : 86274,
          "docs_remaining" : 0,
          "percent_complete" : 100.0
        }
      },
      "stats" : {
        "pages_processed" : 2,
        "documents_processed" : 86274,
        "documents_indexed" : 19,
        "trigger_count" : 1,
        "index_time_in_ms" : 136,
        "index_total" : 1,
        "index_failures" : 0,
        "search_time_in_ms" : 25,
        "search_total" : 2,
        "search_failures" : 0
      },
      "checkpointing" : {
        "operations_behind" : 0
      }
    }
  ]
}

We require better handling for these errors under conditions such as when:

  1. something like a typo causes complete failure of the data frame (note, if the other aggs worked, perhaps this should have completed without resposetime)
  2. aggregations may sporadically return strings (NaN or Infinity) .. these should not silently fail. Is it possible to keep counts of these as index_failures or other stat?

Messages should additionally be logged to data frame notifications, where appropriate, to ease operational management.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions