RecoverySourceHandler#runWithGenericThreadPool caused deadlock #85839
Labels
>bug
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
We saw a benchmark of a 7.17.2 cluster get stuck with all
generic
threads blocked inRecoverySourceHandler#runWithGenericThreadPool
(see many-shards-threaddump.txt.gz):Do we really need to block in this method, or can these actions just be fire-and-forget things? I couldn't see an obvious reason for needing to wait for them to complete.
(FWIW this was clearly caused by setting
cluster.routing.allocation.node_concurrent_recoveries
too high, but really we shouldn't deadlock in any configuration)Relates #77466
The text was updated successfully, but these errors were encountered: