-
Notifications
You must be signed in to change notification settings - Fork 65
test: BatcherImplTest.testThrottlingBlocking to check actual wait time #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Investigating issue googleapis#1931.
@@ -93,6 +93,14 @@ | |||
</excludes> | |||
</configuration> | |||
</plugin> | |||
<plugin> | |||
<!-- Troubleshooting a flaky test in https://github.com/googleapis/sdk-platform-java/issues/1931 --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep this just in case the problem occurs again and we have the logging. This surefire (unit test) configuration does not affect the released libraries to users.
gax-java/gax/src/test/java/com/google/api/gax/batching/BatcherImplTest.java
Show resolved
Hide resolved
} catch (InterruptedException e) { | ||
} | ||
}); | ||
|
||
try { | ||
logger.info("Calling future.get(10 ms)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with Batcher or this test, but what is the significance of 10ms here? If a TimeoutException is expected, why can't we lower the future.get() block time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the log (in the pull request description), the problem is that the TimeoutException isn't always thrown. Lowering the 10 ms to smaller number won't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I read the original comment might not throw TimeoutException in a timely manner.
which made me think that this was it wasn't thrown fast enough because .get(...)
mau return in sub 10ms.
For my understanding, do you know what causes this: thread calling "future.get" was not being run in a timely manner
. Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, from my understanding it would be either case:
- future.get() takes more than 10ms to return -> TimeoutException
- future.get() returns a result -> Check that the actual waitTime was at least 10ms
I think that should suffice due to the variance in scheduling. CC: @blakeli0 Do you know of another option to handle this flakiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future.get() returns a result within 10ms
No, I didn't observe that. What I observed in #1931 was "future.get() took more than 10ms to return normally, without TimeoutException".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, updated the comment above. The docs for Future.get()
seem to suggest that it will throw the exception once 10ms passes, but I'm guessing that is just best effort for limited CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
gax-java/gax/src/test/java/com/google/api/gax/batching/BatcherImplTest.java
Outdated
Show resolved
Hide resolved
Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot. |
[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!
|
[java_showcase_integration_tests] Kudos, SonarCloud Quality Gate passed!
|
#2285) * test: additional logs in BatcherImplTest * Replaced the assertion with actual wait time count
This is the follow-up of a pull request #2285, where we wanted to catch any flaky test failure with logging, just in case the test failed again. https://github.com/googleapis/sdk-platform-java/pull/2285/files#r1416300479 It turned out that the fix worked well. We don't observe the test failures any more. Therefore turning off the excessive logging.
This is the follow-up of a pull request #2285, where we wanted to catch any flaky test failure with logging, just in case the test failed again. https://github.com/googleapis/sdk-platform-java/pull/2285/files#r1416300479 It turned out that the fix worked well. We don't observe the test failures any more. Therefore turning off the excessive logging.
Fixes #1931 . BatcherImplTest.testThrottlingBlocking to check actual wait time, because I observe that
Future.wait
does not always throw TimeoutException in a timely manner. With the additional logging, we observed the following failure:(See the
future.get(10 ms)
returned after 57 ms.)It's probably because the main thread does not always get the CPU resource. This pull request fixes the flaky test failure by checking the actual wait time in the main thread. If it's taking more than 10 milliseconds, then it means the method
future.get(10 ms)
does not return within 10 milliseconds (Good).