Skip to content

[CI] Failure in org.elasticsearch.cluster.routing.AllocationIdIT.testFailedRecoveryOnAllocateStalePrimaryRequiresAnotherAllocateStalePrimary #66893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
original-brownbear opened this issue Dec 30, 2020 · 1 comment · Fixed by #67179
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@original-brownbear
Copy link
Member

This just failed on 7.x here: https://gradle-enterprise.elastic.co/s/rhyopb5hm3gu2/tests/:server:internalClusterTest/org.elasticsearch.cluster.routing.AllocationIdIT/testFailedRecoveryOnAllocateStalePrimaryRequiresAnotherAllocateStalePrimary

  2> REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.cluster.routing.AllocationIdIT.testFailedRecoveryOnAllocateStalePrimaryRequiresAnotherAllocateStalePrimary" -Dtests.seed=491F1EE43B42DEC9 -Dtests.security.manager=true -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dtests.locale=he-IL -Dtests.timezone=Asia/Damascus -Druntime.java=8
  2> java.lang.AssertionError: timed out waiting for yellow state
        at __randomizedtesting.SeedInfo.seed([491F1EE43B42DEC9:C099E915F71965EB]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:953)
        at org.elasticsearch.test.ESIntegTestCase.ensureYellow(ESIntegTestCase.java:913)

does not reproduce locally.

@original-brownbear original-brownbear added >test-failure Triaged test failures from CI :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Dec 30, 2020
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@dnhatn dnhatn self-assigned this Jan 5, 2021
dnhatn added a commit that referenced this issue Jan 13, 2021
This test failed on WindowsFS. We failed to remove the corrupted file if 
it's being opened (for a short window by ListShardStore action) and the
pending delete files were clear when we restarted that node.

This commit fixes the issue by shutting down the node before removing 
the corrupted file to avoid any access to that file.

Closes #66893
dnhatn added a commit that referenced this issue Jan 13, 2021
This test failed on WindowsFS. We failed to remove the corrupted file if 
it's being opened (for a short window by ListShardStore action) and the
pending delete files were clear when we restarted that node.

This commit fixes the issue by shutting down the node before removing 
the corrupted file to avoid any access to that file.

Closes #66893
dnhatn added a commit that referenced this issue Jan 13, 2021
This test failed on WindowsFS. We failed to remove the corrupted file if 
it's being opened (for a short window by ListShardStore action) and the
pending delete files were clear when we restarted that node.

This commit fixes the issue by shutting down the node before removing 
the corrupted file to avoid any access to that file.

Closes #66893
dnhatn added a commit that referenced this issue Jan 13, 2021
This test failed on WindowsFS. We failed to remove the corrupted file if
it's being opened (for a short window by ListShardStore action) and the
pending delete files were clear when we restarted that node.

This commit fixes the issue by shutting down the node before removing
the corrupted file to avoid any access to that file.

Closes #66893
dnhatn added a commit that referenced this issue Jan 13, 2021
This test failed on WindowsFS. We failed to remove the corrupted file if
it's being opened (for a short window by ListShardStore action) and the
pending delete files were clear when we restarted that node.

This commit fixes the issue by shutting down the node before removing
the corrupted file to avoid any access to that file.

Closes #66893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants