-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[CI] Rolling upgrade test failure - failed to obtain in-memory shard lock #48395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-security (:Security/Security) |
Pinging @elastic/es-distributed (:Distributed/Distributed) |
I just observed a failed intake build with similar symptoms (a 'failed to obtain in-memory shard lock' error, along with a build-up of cluster state tasks). Build scan: https://gradle-enterprise.elastic.co/s/f4vu4xgxiwkcs/. |
It looks like something is DDoSing the master with persistent task updates. |
I've opened #48483 to get more details on the tasks that are accumulating. |
Looking closer at this, it seems as if this is a reoccurrence of #39982 |
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to the node on which it was previously allocated, but to which it can't be allocated now due to the shard lock being held because of an in-progress scroll. As the master keeps on retrying and retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing (see #48395). Closes #48395
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to the node on which it was previously allocated, but to which it can't be allocated now due to the shard lock being held because of an in-progress scroll. As the master keeps on retrying and retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing (see #48395). Closes #48395
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to the node on which it was previously allocated, but to which it can't be allocated now due to the shard lock being held because of an in-progress scroll. As the master keeps on retrying and retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing (see #48395). Closes #48395
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to the node on which it was previously allocated, but to which it can't be allocated now due to the shard lock being held because of an in-progress scroll. As the master keeps on retrying and retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing (see #48395). Closes #48395
All tests in TokenBackwardsCompatibilityIT failed when upgrading from 7.4.1 to 7.x latest with 'there are still tasks running' failures at cleanup time. Digging into the logs, it seems that the root of the failure is one of the upgraded master nodes failed to start due to a 'failed to obtain in-memory shard lock' error.
Build scan is here: https://gradle-enterprise.elastic.co/s/qycq6tmil2t3w/console-log?task=:x-pack:qa:rolling-upgrade:v7.4.1%23upgradedClusterTest#L2698
This has happened before, a few days ago, also in TokenBackwardsCompatibilityIT, upgrading from 6.8.4: https://gradle-enterprise.elastic.co/s/46trrv4mlxcle
The text was updated successfully, but these errors were encountered: