Skip to content

Rerun test task when test jdk crashed with System exit #71881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 21, 2021

Conversation

breskeby
Copy link
Contributor

@breskeby breskeby commented Apr 19, 2021

Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm has crashed because of a system exit. We furthermore log potential tests that caused the System.exit based on which tests have been active at the time of the system exit.

@breskeby breskeby added >enhancement :Delivery/Build Build or test infrastructure labels Apr 19, 2021
@breskeby breskeby self-assigned this Apr 19, 2021
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Apr 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@mark-vieira mark-vieira self-requested a review April 19, 2021 20:27
return activeDescriptorsById.size() == 1;
}

public RoundResult getResult() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return new RoundResult(currentRoundFailedTests, previousRoundFailedTests, lastRun());
}

public void reset(boolean lastRetry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the lastRetry argument isn't used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up


private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();

private TestNames currentRoundFailedTests = new TestNames();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getResult() isn't used, I don't think either of these are either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

break;
} catch (ExecException e) {
if (retryCount == maxRetries) {
throw new GradleException("Max retries hit", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we hit the retry limit we never call storeActiveDescriptors(). I think that would mean the crash reporter would report the wrong failures, should there have been any in the last round.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's the benefit of the indirection of having a finalizing task action report the failures rather than just doing so right here? Is this code executed in the worker JVM vs the daemon or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to report the trace for each run and simplified things here.

public void report() {
if(failPaths.size() > 0) {
String report = "================\n" +
"Test JDK System exit trace:\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include in this message something to the effect of "test jvm exited unexpectedly" or would Gradle still report such a thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the reporting a bit to include this. I think there's not much value to show the whole stack trace in gradle reporting the jvm exit as it doesn't provide any real value to trace this down.

@mark-vieira
Copy link
Contributor

Also, what does this look like in build scans if we have partial test execution an then retry? Do test get reported as flaky as they do with the Gradle test retry plugin?

@breskeby breskeby force-pushed the retry-tests-when-jvm-crashed branch from d29060a to d5bd493 Compare April 20, 2021 11:23
@breskeby breskeby requested a review from mark-vieira April 20, 2021 12:33
@breskeby
Copy link
Contributor Author

@mark-vieira sorry that PR wasn't meant to be reviewed already and was meant to be still a draft. Please review again. I cleaned this up and also did some further rework.

@breskeby
Copy link
Contributor Author

Also, what does this look like in build scans if we have partial test execution an then retry? Do test get reported as flaky as they do with the Gradle test retry plugin?

It depends on the result of the tests.

If all tests pass in the first run and in the 2nd run (not taking the system exit causing one into account) then tests are only shown as one time executed in build scan (see example of the build scan I added to the infra test of this PR: https://gradle-enterprise.elastic.co/s/wsczh3rwdcvq4/tests)
If a test failed e.g. in the first execution and then succeed in the 2nd (again not taking the system exit causing one) then we see those tests marked as flaky (see https://gradle-enterprise.elastic.co/s/6vli7zag4ra7u/tests)

@breskeby breskeby requested a review from pugnascotia April 20, 2021 18:44
Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, LGTM.


private final Map<Object, TestDescriptorInternal> activeDescriptorsById = new HashMap<>();

private Object rootTestDescriptorId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "root" descriptor? It's the one for the task, right? Can we clarify this with a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly. added a clarifying comment.

@breskeby breskeby merged commit a1cd67f into elastic:master Apr 21, 2021
@breskeby breskeby deleted the retry-tests-when-jvm-crashed branch April 21, 2021 10:22
breskeby added a commit to breskeby/elasticsearch that referenced this pull request Apr 21, 2021
Related to elastic#52610 this PR introduces a rerun of all tests for a test task if the test jvm 
has crashed because of a system exit. We furthermore log potential tests that caused 
the System.exit based on which tests have been active at the time of the system exit.

We also modified the build scan logic to track unexpected test jvm exists 
with the tag `unexpected-test-jvm-exit`
breskeby added a commit that referenced this pull request Apr 22, 2021
Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm 
has crashed because of a system exit. We furthermore log potential tests that caused 
the System.exit based on which tests have been active at the time of the system exit.

We also modified the build scan logic to track unexpected test jvm exists 
with the tag `unexpected-test-jvm-exit`
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure >enhancement Team:Delivery Meta label for Delivery team v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants