Skip to content

Single unbounded date_range document triggers circuit breaker #53736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spinscale opened this issue Mar 18, 2020 · 2 comments
Closed

Single unbounded date_range document triggers circuit breaker #53736

spinscale opened this issue Mar 18, 2020 · 2 comments

Comments

@spinscale
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version): 7.6.1

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

DELETE test

PUT test 
{
	"mappings": {
		"properties": {
			"dateRange": { "type": "date_range" }
		}
	}
}

PUT test/_doc/1
{
	"dateRange": {
		"gte": "2020-03-01"
	}
}

GET test/_search
{
	"aggs": {
		"test": {
			"date_histogram": {
				"field": "dateRange",
				"interval" : "day"
			}
		}
	}
}

triggers the circuitbreaker like this

{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[request] Data too large, data for [<reused_arrays>] would be [805344256/768mb], which is larger than the limit of [622775500/593.9mb]",
        "bytes_wanted" : 805344256,
        "bytes_limit" : 622775500,
        "durability" : "TRANSIENT"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test",
        "node" : "I8l1uS_GSKO8LfCDlV69OQ",
        "reason" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[request] Data too large, data for [<reused_arrays>] would be [805344256/768mb], which is larger than the limit of [622775500/593.9mb]",
          "bytes_wanted" : 805344256,
          "bytes_limit" : 622775500,
          "durability" : "TRANSIENT"
        }
      }
    ]
  },
  "status" : 429
}

The above snippet triggers the circuitbreaker after a few seconds (which is good!). But just having a single document with an unbounded upper range in your dataset will make any aggregations on date range fields impossible and slow down your system. Maybe we want to exit earlier in that case?

@ajacob
Copy link

ajacob commented Mar 18, 2020

I don't know what's the best thing to do in the scope of this issue, but I think it would be great to have more options for date_histogram aggregation.

We already have an extended_bounds setting, may be we need a new setting to restrict buckets creation ?

It could be a restricted_bounds setting for instance, where we could specify min/max values.

I think this new setting could also be great for multi valued date in addition of date_range when we are not always interested by all auto-generated buckets

@polyfractal
Copy link
Contributor

Thanks @spinscale! Going to close this as a duplicate of: #50109

@ajacob agreed! That's pretty much the direction we are thinking as well. In the most recent comment of that thread (#50109 (comment)) we suggested an extra flag on extended_bounds which lets you indicate that unbounded ranges should be "truncated" at the provided limits.

Incidentally, this would also fix a normal complaint with extended_bounds that it creates more buckets than expected, even for normal field types. E.g. people expect it to work as a hard threshold, when in reality it's a max(extended_bounds, data bounds) situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants