Skip to content

SnapshotIT testCreateSnapshot failure in 6.8 CI #53509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jaymode opened this issue Mar 12, 2020 · 2 comments · Fixed by #54195
Closed

SnapshotIT testCreateSnapshot failure in 6.8 CI #53509

jaymode opened this issue Mar 12, 2020 · 2 comments · Fixed by #54195
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@jaymode
Copy link
Member

jaymode commented Mar 12, 2020

The SnapshotIT#testCreateSnapshot test failed in 6.8 CI due to a missing snapshot. I was unable to reproduce the issue locally.

Reproduce line:

./gradlew ':client:rest-high-level:integTestRunner' \
  -Dtests.seed=7C237C053DB8334 \
  -Dtests.class=org.elasticsearch.client.SnapshotIT \
  -Dtests.method="testCreateSnapshot" \
  -Dtests.security.manager=true \
  -Dtests.locale=en-KY \
  -Dtests.timezone=Etc/GMT-11 \
  -Dcompiler.java=12 \
  -Druntime.java=12
ElasticsearchStatusException[Elasticsearch exception [type=snapshot_missing_exception, reason=[test_repository:test_snapshot] is missing]]
	at __randomizedtesting.SeedInfo.seed([7C237C053DB8334:F9A4A2736568712D]:0)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696)
	at org.elasticsearch.client.SnapshotClient.delete(SnapshotClient.java:296)
	at org.elasticsearch.client.ESRestHighLevelClientTestCase.execute(ESRestHighLevelClientTestCase.java:88)
	at org.elasticsearch.client.ESRestHighLevelClientTestCase.execute(ESRestHighLevelClientTestCase.java:79)
	at org.elasticsearch.client.SnapshotIT.testCreateSnapshot(SnapshotIT.java:149)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:835)
	Suppressed: org.elasticsearch.client.ResponseException: method [DELETE], host [http://[::1]:40681], URI [/_snapshot/test_repository/test_snapshot?master_timeout=30s], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"snapshot_missing_exception","reason":"[test_repository:test_snapshot] is missing"}],"type":"snapshot_missing_exception","reason":"[test_repository:test_snapshot] is missing"},"status":404}
		at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764)

Build scan: https://gradle-enterprise.elastic.co/s/u3csmaokg7iak

@jaymode jaymode added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Mar 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@DaveCTurner DaveCTurner self-assigned this Mar 18, 2020
@original-brownbear original-brownbear self-assigned this Mar 25, 2020
@original-brownbear
Copy link
Member

This is the result of fairly unlikely race in our delete snapshot logic. The delete does two things before it runs the actual delete:

  • Check the repository data for the snapshot name that is to be deleted
  • If that fails, check the cluster state if there is a snapshot by the given name in progress

If the snapshot is in progress during the first check but finishes before the second check, we get a 404 because we fail to see the snapshot as existing or in-progress. There is no way of really fixing this in 6.x (fix in 7.x + master is incoming as part of the concurrent repo operations work though). I'll find a way to make the test resilient to this for now.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Mar 25, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes elastic#53509
original-brownbear added a commit that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes #53509
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes elastic#53509
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes elastic#53509
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes elastic#53509
original-brownbear added a commit that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes #53509
original-brownbear added a commit that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes #53509
original-brownbear added a commit that referenced this issue Mar 26, 2020
Retry here to work around the possible race between snapshot finalization
and deletion.

Closes #53509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants