Closed
Description
Found in 7.2.0-BC5
The following data frame preview contains a typo in responsetime
. Composite aggs returns avg(resposetime)
as a string "NaN"
.
GET _data_frame/transforms/_preview
{
"id": "farequote-airline-100",
"source" : { "index": [ "farequote-*" ] },
"dest" : { "index": "df-farequote-airline-100" },
"pivot": {
"group_by": {
"airline": { "terms": { "field": "airline" }}
},
"aggregations": {
"last_updated": { "max": { "field": "@timestamp" } },
"mean_response": { "avg": { "field": "resposetime" } },
"count": { "value_count": { "field": "airline" } }
}
}
}
returns
{
"preview" : [
{
"last_updated" : "2019-02-11T23:59:02.000Z",
"count" : 8728.0,
"mean_response" : "NaN",
"airline" : "AAL"
},
...
The data frame silently fails to index these values resulting in an entirely empty dest index.
However there are no errors in elasticsearch logs. There are no warnings or errors in data frame notifications, and the stats make it appear that the data frame transform was successful.
{
"count" : 1,
"transforms" : [
{
"id" : "farequote-airline-100",
"state" : {
"task_state" : "stopped",
"indexer_state" : "stopped",
"checkpoint" : 1,
"progress" : {
"total_docs" : 86274,
"docs_remaining" : 0,
"percent_complete" : 100.0
}
},
"stats" : {
"pages_processed" : 2,
"documents_processed" : 86274,
"documents_indexed" : 19,
"trigger_count" : 1,
"index_time_in_ms" : 136,
"index_total" : 1,
"index_failures" : 0,
"search_time_in_ms" : 25,
"search_total" : 2,
"search_failures" : 0
},
"checkpointing" : {
"operations_behind" : 0
}
}
]
}
We require better handling for these errors under conditions such as when:
- something like a typo causes complete failure of the data frame (note, if the other aggs worked, perhaps this should have completed without
resposetime
) - aggregations may sporadically return strings (
NaN
orInfinity
) .. these should not silently fail. Is it possible to keep counts of these asindex_failures
or other stat?
Messages should additionally be logged to data frame notifications, where appropriate, to ease operational management.