-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[CI] Test Failure in CloneSnapshotIT.testBackToBackClonesForIndexNotInCluster #64115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
Comments
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 25, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes elastic#64115
Tracked it down in #64116 |
original-brownbear
added a commit
that referenced
this issue
Oct 26, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes #64115
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 26, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes elastic#64115
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 26, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes elastic#64115
original-brownbear
added a commit
that referenced
this issue
Oct 26, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes #64115
original-brownbear
added a commit
that referenced
this issue
Oct 26, 2020
We must not remove the snapshot from the initializing set in the `timeout` getter. This was a plain oversight/mistake and went unnoticed. It can lead to the removal of a valid snapshot clone from the cluster state in rare circumstances (e.g. when a node concurrently joins the cluster or a routing change happens as it did in the linked test failure). Closes #64115
The failure happened again in 7.x so I am reopening: Same symptom, the suite timeout waiting for threads to finish:
|
Thanks for reopening Jim, this was/is caused by a missing backport (my fault). Merged now -> closing here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
This failed exactly once in
7.x
but I can't really explain why and how (https://gradle-enterprise.elastic.co/s/jeugiua6ddqlc).failed without any exception by simply failing to ever make progress on one of the clone operations in that test leading to a timeout:
I'll try to reason about this a little more and add some logging to see if I can track it down.
The text was updated successfully, but these errors were encountered: