Skip to content

CoordinatorTests testAckListenerReceivesNacksIfPublicationTimesOut failure #44073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidkyle opened this issue Jul 8, 2019 · 4 comments · Fixed by #44077
Closed

CoordinatorTests testAckListenerReceivesNacksIfPublicationTimesOut failure #44073

davidkyle opened this issue Jul 8, 2019 · 4 comments · Fixed by #44077
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI

Comments

@davidkyle
Copy link
Member

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part1/95/console
https://gradle.com/s/i5matu3ca3yl6

java.lang.AssertionError: leaders
Expected: not an empty collection
     but: was <[]>
	at __randomizedtesting.SeedInfo.seed([1D931938A0E635D1:97D103FE47BECADC]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.getAnyLeader(AbstractCoordinatorTestCase.java:605)
	at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.stabilise(AbstractCoordinatorTestCase.java:475)
	at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.stabilise(AbstractCoordinatorTestCase.java:463)
	at org.elasticsearch.cluster.coordination.CoordinatorTests.testAckListenerReceivesNacksIfPublicationTimesOut(CoordinatorTests.java:622)

This actually reproduces on 7.x

./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testAckListenerReceivesNacksIfPublicationTimesOut" \
  -Dtests.seed=1D931938A0E635D1 \
  -Dtests.security.manager=true \
  -Dtests.locale=fr-LU \
  -Dtests.timezone=America/Mazatlan \
  -Dcompiler.java=12 \
  -Druntime.java=8
@davidkyle davidkyle added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jul 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner self-assigned this Jul 8, 2019
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jul 8, 2019
Since elastic#42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in elastic#42636.

Fixes elastic#44073
@DaveCTurner
Copy link
Contributor

"actually reproduces" lol. It's almost like we tried to make these tests deterministic 😁

I opened #44077 which should fix this.

@davidkyle
Copy link
Member Author

Similar failure on master

  2> java.lang.AssertionError: leaders
    Expected: not an empty collection
         but: was <[]>
        at __randomizedtesting.SeedInfo.seed([C6A4B2315799684B:DCC78B49D0215FE1]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.getAnyLeader(AbstractCoordinatorTestCase.java:605)
        at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.stabilise(AbstractCoordinatorTestCase.java:475)
        at org.elasticsearch.cluster.coordination.AbstractCoordinatorTestCase$Cluster.stabilise(AbstractCoordinatorTestCase.java:463)
        at org.elasticsearch.cluster.coordination.CoordinatorTests.testCanShrinkFromFiveNodesToThree(CoordinatorTests.java:274)

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/542/console
https://scans.gradle.com/s/rupwqezqp6kd6

And another win for reproducibility!

/gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testCanShrinkFromFiveNodesToThree" \
  -Dtests.seed=C6A4B2315799684B \
  -Dtests.security.manager=true \
  -Dtests.locale=fr-GF \
  -Dtests.timezone=Pacific/Auckland \
  -Dcompiler.java=12 \
  -Druntime.java=11

@DaveCTurner
Copy link
Contributor

Yep #44077 fixes that one too. The too-short timeout is systemic rather than a problem with any particular test so if you're muting things you'll be best off muting all suites that derive from AbstractCoordinatorTestCase.

DaveCTurner added a commit that referenced this issue Jul 9, 2019
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.

Fixes #44073
DaveCTurner added a commit that referenced this issue Jul 9, 2019
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.

Fixes #44073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants