Rerun test task when test jdk crashed with System exit #71881

breskeby · 2021-04-19T20:09:51Z

Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm has crashed because of a system exit. We furthermore log potential tests that caused the System.exit based on which tests have been active at the time of the system exit.

elasticmachine · 2021-04-19T20:09:54Z

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira · 2021-04-19T22:35:20Z

...ain/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestResultProcessor.java

+        return activeDescriptorsById.size() == 1;
+    }
+
+    public RoundResult getResult() {


This doesn't seem to be used?

mark-vieira · 2021-04-19T22:35:43Z

...ain/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestResultProcessor.java

+        return new RoundResult(currentRoundFailedTests, previousRoundFailedTests, lastRun());
+    }
+
+    public void reset(boolean lastRetry) {


Looks like the lastRetry argument isn't used.

mark-vieira · 2021-04-19T22:37:57Z

...ain/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestResultProcessor.java

+
+    private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();
+
+    private TestNames currentRoundFailedTests = new TestNames();


Since getResult() isn't used, I don't think either of these are either.

mark-vieira · 2021-04-19T22:40:39Z

...c/src/main/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestExecuter.java

+                break;
+            } catch (ExecException e) {
+                if (retryCount == maxRetries) {
+                    throw new GradleException("Max retries hit", e);


So if we hit the retry limit we never call storeActiveDescriptors(). I think that would mean the crash reporter would report the wrong failures, should there have been any in the last round.

Also, what's the benefit of the indirection of having a finalizing task action report the failures rather than just doing so right here? Is this code executed in the worker JVM vs the daemon or something?

changed this to report the trace for each run and simplified things here.

mark-vieira · 2021-04-19T22:44:46Z

...c/src/main/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestExecuter.java

+        public void report() {
+            if(failPaths.size() > 0) {
+                String report = "================\n" +
+                        "Test JDK System exit trace:\n" +


I think we should include in this message something to the effect of "test jvm exited unexpectedly" or would Gradle still report such a thing?

I tweaked the reporting a bit to include this. I think there's not much value to show the whole stack trace in gradle reporting the jvm exit as it doesn't provide any real value to trace this down.

mark-vieira · 2021-04-19T22:55:37Z

Also, what does this look like in build scans if we have partial test execution an then retry? Do test get reported as flaky as they do with the Gradle test retry plugin?

- report jvm trace each round - add test coverage for max reruns

breskeby · 2021-04-20T12:34:39Z

@mark-vieira sorry that PR wasn't meant to be reviewed already and was meant to be still a draft. Please review again. I cleaned this up and also did some further rework.

breskeby · 2021-04-20T13:19:14Z

Also, what does this look like in build scans if we have partial test execution an then retry? Do test get reported as flaky as they do with the Gradle test retry plugin?

It depends on the result of the tests.

If all tests pass in the first run and in the 2nd run (not taking the system exit causing one into account) then tests are only shown as one time executed in build scan (see example of the build scan I added to the infra test of this PR: https://gradle-enterprise.elastic.co/s/wsczh3rwdcvq4/tests)
If a test failed e.g. in the first execution and then succeed in the 2nd (again not taking the system exit causing one) then we see those tests marked as flaky (see https://gradle-enterprise.elastic.co/s/6vli7zag4ra7u/tests)

mark-vieira

Sweet, LGTM.

mark-vieira · 2021-04-20T21:59:29Z

...ain/java/org/elasticsearch/gradle/internal/test/rerun/executer/RerunTestResultProcessor.java

+
+    private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();
+
+    private Object rootTestDescriptorId;


What's the "root" descriptor? It's the one for the task, right? Can we clarify this with a comment?

exactly. added a clarifying comment.

Related to elastic#52610 this PR introduces a rerun of all tests for a test task if the test jvm has crashed because of a system exit. We furthermore log potential tests that caused the System.exit based on which tests have been active at the time of the system exit. We also modified the build scan logic to track unexpected test jvm exists with the tag `unexpected-test-jvm-exit`

Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm has crashed because of a system exit. We furthermore log potential tests that caused the System.exit based on which tests have been active at the time of the system exit. We also modified the build scan logic to track unexpected test jvm exists with the tag `unexpected-test-jvm-exit`

breskeby added >enhancement :Delivery/Build Build or test infrastructure labels Apr 19, 2021

breskeby self-assigned this Apr 19, 2021

elasticmachine added the Team:Delivery Meta label for Delivery team label Apr 19, 2021

breskeby added the v8.0.0 label Apr 19, 2021

mark-vieira self-requested a review April 19, 2021 20:27

mark-vieira reviewed Apr 19, 2021

View reviewed changes

breskeby added 4 commits April 20, 2021 13:19

Initial work on rerun tests that failed with jvm exit

a38dc83

Add more test coverage and cleanup

f959c43

Some more cleanup

47e30ba

Rework and tweak reporting

d5bd493

- report jvm trace each round - add test coverage for max reruns

breskeby force-pushed the retry-tests-when-jvm-crashed branch from d29060a to d5bd493 Compare April 20, 2021 11:23

breskeby requested a review from mark-vieira April 20, 2021 12:33

Apply review feedback on tweaking reporting

678b39d

breskeby requested a review from pugnascotia April 20, 2021 18:44

mark-vieira approved these changes Apr 20, 2021

View reviewed changes

breskeby added 3 commits April 21, 2021 09:13

Add clarifying comment about root test discriptor

f83421f

Minor formatting fix

eb8c1ca

Track test jvm crashes with tag

90556a3

breskeby merged commit a1cd67f into elastic:master Apr 21, 2021

breskeby deleted the retry-tests-when-jvm-crashed branch April 21, 2021 10:22

breskeby mentioned this pull request Apr 21, 2021

Rerun test task when test jdk crashed with System exit (7.x backport) #72003

Merged

mark-vieira mentioned this pull request Apr 21, 2021

[CI] Builds failing due to Gradle test executor crash #52610

Closed

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021


		private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();

		private TestNames currentRoundFailedTests = new TestNames();


		private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();

		private Object rootTestDescriptorId;

Rerun test task when test jdk crashed with System exit #71881

Rerun test task when test jdk crashed with System exit #71881

Uh oh!

Conversation

breskeby commented Apr 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Apr 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mark-vieira commented Apr 19, 2021

Uh oh!

breskeby commented Apr 20, 2021

Uh oh!

breskeby commented Apr 20, 2021

Uh oh!

mark-vieira left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

breskeby commented Apr 19, 2021 •

edited

Loading