Single unbounded date_range document triggers circuit breaker #53736

spinscale · 2020-03-18T15:12:06Z

Elasticsearch version (bin/elasticsearch --version): 7.6.1

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

DELETE test

PUT test 
{
	"mappings": {
		"properties": {
			"dateRange": { "type": "date_range" }
		}
	}
}

PUT test/_doc/1
{
	"dateRange": {
		"gte": "2020-03-01"
	}
}

GET test/_search
{
	"aggs": {
		"test": {
			"date_histogram": {
				"field": "dateRange",
				"interval" : "day"
			}
		}
	}
}

triggers the circuitbreaker like this

{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[request] Data too large, data for [<reused_arrays>] would be [805344256/768mb], which is larger than the limit of [622775500/593.9mb]",
        "bytes_wanted" : 805344256,
        "bytes_limit" : 622775500,
        "durability" : "TRANSIENT"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test",
        "node" : "I8l1uS_GSKO8LfCDlV69OQ",
        "reason" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[request] Data too large, data for [<reused_arrays>] would be [805344256/768mb], which is larger than the limit of [622775500/593.9mb]",
          "bytes_wanted" : 805344256,
          "bytes_limit" : 622775500,
          "durability" : "TRANSIENT"
        }
      }
    ]
  },
  "status" : 429
}

The above snippet triggers the circuitbreaker after a few seconds (which is good!). But just having a single document with an unbounded upper range in your dataset will make any aggregations on date range fields impossible and slow down your system. Maybe we want to exit earlier in that case?

The text was updated successfully, but these errors were encountered:

ajacob · 2020-03-18T16:05:29Z

I don't know what's the best thing to do in the scope of this issue, but I think it would be great to have more options for date_histogram aggregation.

We already have an extended_bounds setting, may be we need a new setting to restrict buckets creation ?

It could be a restricted_bounds setting for instance, where we could specify min/max values.

I think this new setting could also be great for multi valued date in addition of date_range when we are not always interested by all auto-generated buckets

polyfractal · 2020-03-18T16:22:09Z

Thanks @spinscale! Going to close this as a duplicate of: #50109

@ajacob agreed! That's pretty much the direction we are thinking as well. In the most recent comment of that thread (#50109 (comment)) we suggested an extra flag on extended_bounds which lets you indicate that unbounded ranges should be "truncated" at the provided limits.

Incidentally, this would also fix a normal complaint with extended_bounds that it creates more buckets than expected, even for normal field types. E.g. people expect it to work as a hard threshold, when in reality it's a max(extended_bounds, data bounds) situation.

$@polyfractal$ polyfractal closed this as completed Mar 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single unbounded date_range document triggers circuit breaker #53736

Single unbounded date_range document triggers circuit breaker #53736

spinscale commented Mar 18, 2020

ajacob commented Mar 18, 2020

polyfractal commented Mar 18, 2020

Single unbounded date_range document triggers circuit breaker #53736

Single unbounded date_range document triggers circuit breaker #53736

Comments

spinscale commented Mar 18, 2020

ajacob commented Mar 18, 2020

polyfractal commented Mar 18, 2020