Skip to content

[ML] Avoid 5s wait in AbstractNativeProcessTests #74916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 5, 2021

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Jul 5, 2021

Running the unit tests I've noticed AbstractNativeProcessTests often takes a long time to complete, individual methods either complete very quickly or take a little over 5 seconds suggesting something is timing out.

Indeed the 5 second wait is in the close() method

When used in a try-with-resource block close() is called before mockNativeProcessLoggingStreamEnds.countDown() so the call to wait for the log tail future to finish always times out as it is waiting on the mockNativeProcessLoggingStreamEnds latch to countdown.

Closes #37339

@davidkyle davidkyle added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 auto-backport Automatically create backport pull requests when merged v7.15.0 labels Jul 5, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jul 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle davidkyle changed the title [ML] Speed up the AbstractNativeProcessTests [ML] Avoid 5s wait in AbstractNativeProcessTests Jul 5, 2021
Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 89 to 93
try (AbstractNativeProcess process = new TestNativeProcess()) {
process.start(executorService);
} finally {
mockNativeProcessLoggingStreamEnds.countDown();
// Not detecting a crash is confirmed in terminateExecutorService()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be made even better by nesting the finally so that we count down before close whether the test passes or fails.

        try (AbstractNativeProcess process = new TestNativeProcess()) {
            try {
                process.start(executorService);
            } finally {
                mockNativeProcessLoggingStreamEnds.countDown();
                // Not detecting a crash is confirmed in terminateExecutorService()
            }
        }

(And obviously the same pattern for all the other places too.)

Or to avoid nested try blocks it could be done like we're already doing it in testStart_DoNotDetectCrashWhenProcessIsBeingKilled:

        AbstractNativeProcess process = new TestNativeProcess()) {
        try {
            process.start(executorService);
        } finally {
            mockNativeProcessLoggingStreamEnds.countDown();
            // Not detecting a crash is confirmed in terminateExecutorService()
            process.close();
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if the test fails or throws before mockNativeProcessLoggingStreamEnds.countDown(); the 5s wait will be hit again. I've optimised for the common case and the fix was pleasingly simple so I didn't go any further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't be bothered then let me push a change while the PR is open, as at least that will save the effort of two PRs and two backports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the nested try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ❤️

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle merged commit 7b5b9d0 into elastic:master Jul 5, 2021
@davidkyle davidkyle deleted the speed-up-test branch July 5, 2021 15:04
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 5, 2021
Calling close() before counting down the latch was causing a 5 second wait and timeout in the tests
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.x

droberts195 pushed a commit that referenced this pull request Jul 7, 2021
Calling close() before counting down the latch was causing a 5 second wait and timeout in the tests

Co-authored-by: David Kyle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ML QA tasks are slow, take up 20 minutes of build time
6 participants