[CI] FullClusterRestartIT.testRecovery fails #51640

cbuescher · 2020-01-29T17:23:52Z

Log	https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+default-distro/718/console
https://gradle-enterprise.elastic.co/s/fqcywooezw23y

There are several old issue like #46712 that look a bit like this but all are closed so I'm opening this one for the team to check if this is of interes and/or related.

Could not reproduce locally with

./gradlew ':qa:full-cluster-restart:v8.0.0#upgradedClusterTest' --tests "org.elasticsearch.upgrades.FullClusterRestartIT.testRecovery" \
  -Dtests.seed=162CC3078CFA3EA9 \
  -Dtests.security.manager=true \
  -Dtests.locale=en-PH \
  -Dtests.timezone=Pacific/Funafuti \
  -Dtests.distribution=default \
  -Dcompiler.java=13

Failure

java.lang.AssertionError: mismatch while checking for translog recovery
testrecovery 0 existing_store done 0
testrecovery 0 peer           done 0
 expected:<true> but was:<false>
	at __randomizedtesting.SeedInfo.seed([162CC3078CFA3EA9:D7DCBAABA1AAF40E]:0)
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.elasticsearch.upgrades.FullClusterRestartIT.testRecovery(FullClusterRestartIT.java:737)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:834)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-01-29T17:23:53Z

Pinging @elastic/es-distributed (:Distributed/Recovery)

testRecovery relies on the fact that shards are not flushed on inactive. Our CI recently was too slow. It took more than 20 minutes to complete the full cluster restart suite. This slowness caused some shards of testRecovery were flushed on inactive. This commit increases the inactive time to 1h to reduce this noise. Closes #51640

cbuescher added >test-failure Triaged test failures from CI :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jan 29, 2020

dnhatn self-assigned this Jan 29, 2020

dnhatn mentioned this issue Jan 29, 2020

Increase shard inactive time to 1h in upgrade tests #51651

Merged

dnhatn closed this as completed in #51651 Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] FullClusterRestartIT.testRecovery fails #51640

[CI] FullClusterRestartIT.testRecovery fails #51640

cbuescher commented Jan 29, 2020

elasticmachine commented Jan 29, 2020

[CI] FullClusterRestartIT.testRecovery fails #51640

[CI] FullClusterRestartIT.testRecovery fails #51640

Comments

cbuescher commented Jan 29, 2020

elasticmachine commented Jan 29, 2020