Skip to content

Limit index creation rate #20760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nik9000 opened this issue Oct 5, 2016 · 7 comments
Closed

Limit index creation rate #20760

nik9000 opened this issue Oct 5, 2016 · 7 comments
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement help wanted adoptme Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. team-discuss

Comments

@nik9000
Copy link
Member

nik9000 commented Oct 5, 2016

5.0 introduces a per-node limit on the rate of inline script compilations that should help catch the anti-pattern of embedding script parameters in the scripts themselves. I wonder if it is worth adding a master-only limit on the rate of indexes created to catch situations where people accidentally misconfigure an input system and it ends up creating thousands of indexes in quick succession. Such a rate limit would cause indexing to fail with a useful error message, causing back pressure in any queueing system. I think this'd be better than just creating thousands of indexes as fast as we can.

Is this a good idea or a horrible idea?

@nik9000 nik9000 added the discuss label Oct 5, 2016
@dakrone
Copy link
Member

dakrone commented Oct 5, 2016

Have we run into any situations where someone has actually hit this issue? I don't recall seeing any github issues about it before.

@nik9000
Copy link
Member Author

nik9000 commented Oct 5, 2016

I don't recall seeing any github issues about it before.

I've seen it come through Elastic's support organization a few times. I expect this hasn't come up on github because Elasticsearch isn't the root cause of the issue.

@jpountz
Copy link
Contributor

jpountz commented Oct 5, 2016

I think I'd rather limit the total number of indices/shards in a cluster than the creation rate.

@clintongormley
Copy link
Contributor

clintongormley commented Oct 7, 2016

Discussed in FixItFriday. There are two issues here: creating too many indices and creating indices faster than the master can cope. We suggest adding two safeguards:

max_shards_per_node

This setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than max_shards_per_node * number_of_nodes then the user action can be rejected. This implementation allows the max value to be exceeded if (eg) a node fails, resulting in a lower total max shards per cluster.

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

max_concurrent_index_creations

This would be a simple counter which counts the number of in-flight index creation requests. New requests which would cause the max to be exceeded would be rejected. The aim of this setting is not to queue up potentially thousands of index creations which could be caused by erroneously trying to create an index per document. Default eg 30

@clintongormley
Copy link
Contributor

The max_shards_per_node change will be handled in #20705

@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018
@DaveCTurner
Copy link
Contributor

We now have a limit on the number of shards per node in a cluster, thanks to #34892.

I've marked this as team-discuss because I would like to revisit the discussion about limiting the number of concurrent index creations, or applying another rate limit. I question how easy it would be to set this correctly. If using time-based indices then sometimes we might want to create many indices at the same time. Conversely, even if you could only create a single index at once I think the time it'd take to hit the shards-per-node limit is comparable with the time it'd take to react to a rogue client that's creating too many indices, so I don't think the concurrency limit helps much.

In short, I think we can close this.

@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@DaveCTurner
Copy link
Contributor

We discussed this today and agreed to close this for the reasons I described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement help wanted adoptme Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. team-discuss
Projects
None yet
Development

No branches or pull requests

6 participants