Create a circuit breaker to prevent searches from bringing down a node #2929

maioriel · 2013-04-23T16:47:14Z

One of the fears that I have when using ElasticSearch is that expensive queries can bring down nodes in my cluster.

It would be really nice if ElasticSearch could detect this type of node-killing event by adding logic that would trigger a circuit breaker and kill the offending query, leaving my node intact. For example, if a search takes X% of the heap, the query would be killed by ElasticSearch. It would be useful to expose the X% of heap_size as a configurable value since the level of concurrency of the system would vary by ES installation.

Another feature that would be helpful is when the circuit breaker is tripped, a response is generated from ElasticSearch saying that the query died from using excess memory.

tmkujala · 2013-04-25T19:43:44Z

This is exactly what I'm looking for as well! One of my requirements is to provide open API access to our ElasticSearch data for developers to run adhoc queries. There is a very real possibility that one of them may execute a bad query bringing down a single node or much worse multiple nodes in my cluster.

What would make this feature even better, is additional performance monitoring for what queries are running at any given time and what queries have been run as well as performance metrics for them.

tlieblfs · 2013-04-25T21:14:49Z

+1

s1monw · 2013-04-26T10:44:44Z

Hey folks, I want to jump in here and tell you that this is something that is pretty high on our wish-list as well. With the foundations 0.90 will bring we can approach things like this much easier and maybe more important more reliable. I might jump in here and have a first cut at this pretty soon.

rore · 2013-05-08T14:24:01Z

+1

btiernay · 2013-05-20T22:59:35Z

👍

lmenezes · 2013-06-05T10:09:51Z

+1

nik9000 · 2013-08-07T01:29:39Z

+1
Certainly it'd be cool to get a list of running queries and be able to kill them if they are running wild. That'd be a wonderful first start to anything along these lines.

avleen · 2013-11-20T18:18:55Z

@s1monw any update on this? We have some really large indices, and big searches over terrabytes of data can bring down the cluster right now because the searches just keep going forever :-(

dakrone · 2013-11-20T19:54:04Z

@avleen we are actively developing this, so hopefully soon!

dakrone · 2013-11-26T20:51:31Z

Related: #4261

lukas-vlcek · 2013-11-26T21:54:58Z

Interesting!

BTW, is there any impact on bulk operations? Like bulk update? Meaning once the circuit breaks the bulk operation will still go on but all remaining updates targeting particular shard will not make it?
(Also might impact #2230 if implemented in the future?)

dakrone · 2014-01-28T21:18:29Z

Closing this issue since #4261 landed.

roncemer · 2014-10-29T21:15:35Z

I'd love to see ES automatically detect when a query is going to use more than a certain percentage of the heap, and automatically use temporary files to do its sorting, merging and so on. That would give it the ability to run arbitrary queries (like MySQL) without bringing down the node. The query would just take a long time to run. And in many cases, that's absolutely fine -- especially when doing aggregations and similar analytic queries.

avleen · 2014-10-29T22:18:34Z

I wouldn't say many cases. Maybe in some cases :-)
The problem with using disk for this is that you can increase the IO and
also hurt other queries, and again bring down s node. Elasticsearch is
quite sensitive to IO bandwidth. But it would certainly be nice to have the
option.

On Wed, Oct 29, 2014, 17:15 roncemer [email protected] wrote:

I'd love to see ES automatically detect when a query is going to use more
than a certain percentage of the heap, and automatically use temporary
files to do its sorting, merging and so on. That would give it the ability
to run arbitrary queries (like MySQL) without bringing down the node. The
query would just take a long time to run. And in many cases, that's
absolutely fine -- especially when doing aggregations and similar analytic
queries.

Reply to this email directly or view it on GitHub
#2929 (comment)
.

kimchy · 2014-10-30T01:03:32Z

just putting note here, that though not "on demand", doc values as an option (using on disk storage for certain expensive, memory wise, fields that are used for aggs and/or sorting). A lot of progress has been made both in Lucene and ES to make them faster, 1.4 would be a huge step forward, and the following ES version that would work with Lucene 5 will be even better. We are heavily investing both in Lucene and ES to make this a performant and viable option.

avleen · 2014-10-30T01:32:13Z

Shay, I think we'd noticed a significant I/O impact (probably caused by
more writes?) with doc values.
Do the recent changes improve that situation?

On Wed, Oct 29, 2014, 21:03 Shay Banon [email protected] wrote:

just putting note here, that though not "on demand", doc values as an
option (using on disk storage for certain expensive, memory wise, fields
that are used for aggs and/or sorting). A lot of progress has been made
both in Lucene and ES to make them faster, 1.4 would be a huge step
forward, and the following ES version that would work with Lucene 5 will be
even better. We are heavily investing both in Lucene and ES to make this a
performant and viable option.

Reply to this email directly or view it on GitHub
#2929 (comment)
.

xelldran1 · 2014-11-25T16:34:17Z

+1

diannamcallister · 2019-06-25T16:57:23Z

+1

vipul-mykaarma · 2020-11-23T18:19:45Z

any update on this ?

rebeccahum · 2022-01-04T17:23:51Z

Hiya, following up too on ^^

ghost assigned s1monw Apr 26, 2013

javanna mentioned this issue Sep 6, 2013

Query timeout ignored #3627

Closed

javanna mentioned this issue Nov 4, 2013

Bound the number of search results returned by elasticsearch #4026

Closed

dakrone closed this as completed Jan 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a circuit breaker to prevent searches from bringing down a node #2929

Create a circuit breaker to prevent searches from bringing down a node #2929

maioriel commented Apr 23, 2013

tmkujala commented Apr 25, 2013

tlieblfs commented Apr 25, 2013

s1monw commented Apr 26, 2013

rore commented May 8, 2013

btiernay commented May 20, 2013

lmenezes commented Jun 5, 2013

nik9000 commented Aug 7, 2013

avleen commented Nov 20, 2013

dakrone commented Nov 20, 2013

dakrone commented Nov 26, 2013

lukas-vlcek commented Nov 26, 2013

dakrone commented Jan 28, 2014

roncemer commented Oct 29, 2014

avleen commented Oct 29, 2014

kimchy commented Oct 30, 2014

avleen commented Oct 30, 2014

xelldran1 commented Nov 25, 2014

diannamcallister commented Jun 25, 2019

vipul-mykaarma commented Nov 23, 2020

rebeccahum commented Jan 4, 2022

Create a circuit breaker to prevent searches from bringing down a node #2929

Create a circuit breaker to prevent searches from bringing down a node #2929

Comments

maioriel commented Apr 23, 2013

tmkujala commented Apr 25, 2013

tlieblfs commented Apr 25, 2013

s1monw commented Apr 26, 2013

rore commented May 8, 2013

btiernay commented May 20, 2013

lmenezes commented Jun 5, 2013

nik9000 commented Aug 7, 2013

avleen commented Nov 20, 2013

dakrone commented Nov 20, 2013

dakrone commented Nov 26, 2013

lukas-vlcek commented Nov 26, 2013

dakrone commented Jan 28, 2014

roncemer commented Oct 29, 2014

avleen commented Oct 29, 2014

kimchy commented Oct 30, 2014

avleen commented Oct 30, 2014

xelldran1 commented Nov 25, 2014

diannamcallister commented Jun 25, 2019

vipul-mykaarma commented Nov 23, 2020

rebeccahum commented Jan 4, 2022