Skip to content

ClusterDisruptionIT#testAckedIndexing failures #41068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cbuescher opened this issue Apr 10, 2019 · 7 comments
Closed

ClusterDisruptionIT#testAckedIndexing failures #41068

cbuescher opened this issue Apr 10, 2019 · 7 comments
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI

Comments

@cbuescher
Copy link
Member

On master: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+artifactory/329/console

Could not reproduce locally:

./gradlew :server:integTest --tests "org.elasticsearch.discovery.ClusterDisruptionIT.testAckedIndexing" \
  -Dtests.seed=E05E829B32C2E662 \
  -Dtests.security.manager=true \
  -Dtests.locale=en-SH \
  -Dtests.timezone=Europe/Oslo \
  -Dcompiler.java=12 \
  -Druntime.java=11
java.lang.AssertionError: AcknowledgedResponse failed - not acked
Expected: <true>
     but: was <false>
	at __randomizedtesting.SeedInfo.seed([E05E829B32C2E662:6A9F36686EBF0029]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:112)
	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:116)
	at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:142)
	at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:79)
	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:569)
	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2170)
	at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
@cbuescher cbuescher added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Apr 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented May 7, 2019

The issue here is that disruptions are not always fully healed when the test ends, which leads to failures in clean-up actions. @DaveCTurner thoughts on how to ensure this?

@matriv
Copy link
Contributor

matriv commented May 7, 2019

Another failure encountered here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+periodic/486/console

Reproduce with:

./gradlew :server:integTest --tests "org.elasticsearch.discovery.ClusterDisruptionIT.testAckedIndexing" \
  -Dtests.seed=D0453D059CF4F8AE \
  -Dtests.security.manager=true \
  -Dtests.locale=hr-HR \
  -Dtests.timezone=Pacific/Wake \
  -Dcompiler.java=12 \
  -Druntime.java=8

Couldn't reproduce locally.

The message is different though:

org.elasticsearch.discovery.ClusterDisruptionIT > testAckedIndexing FAILED
    java.lang.AssertionError: [test][2], node[kWppasG3RSKIRbBheOdOVA], [R], s[STARTED], a[id=SxfxSm_6Q3KWu6VDYMA0bQ] global checkpoint syncs mismatch
    Expected: <3L>
         but: was <4L>

Here are the complete logs in case they're lost from ci: https://gist.github.com/matriv/ee422e416aac35910194a7319a595ed6

@ywelsch
Copy link
Contributor

ywelsch commented May 22, 2019

@matriv I've figured out your failure. The issue was that the global checkpoint sync interval was not correctly reduced for testAckedIndexing. This is fixed in d49d9b5 /cc: @dnhatn

ywelsch added a commit that referenced this issue May 22, 2019
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
ywelsch added a commit that referenced this issue May 22, 2019
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
ywelsch added a commit that referenced this issue May 22, 2019
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
@dnhatn
Copy link
Member

dnhatn commented May 22, 2019

Thanks @ywelsch.

@ywelsch ywelsch self-assigned this May 22, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates elastic#41068, elastic#38931
henningandersen pushed a commit that referenced this issue Jun 20, 2019
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
@henningandersen
Copy link
Contributor

Backported this to 7.2 too, since these test-failures still show up here and this was clearly intended to go there too.
Failure:
https://scans.gradle.com/s/2r5a6flov3bfu/tests/kyv2y2z3r4v7m-f6hbue3yhtsqi

@ywelsch
Copy link
Contributor

ywelsch commented Sep 18, 2019

Closing this as this issue mixed various unrelated test failures. With the most recent one being fixed, and the prior one not having shown up recently, I prefer to close this issue and instead have a new issue created once we get a newer failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants