-
Notifications
You must be signed in to change notification settings - Fork 6k
Conversation
Oh, one other problem with these timeouts - if they trigger in an unexpected way, the shell gets collected on the wrong thread, and the test process crashes with no backtrace. |
#if FLUTTER_RELEASE | ||
// Our batch time is 1000ms. Hopefully the 800ms limit is relaxed enough to | ||
// make it not too flaky. | ||
ASSERT_TRUE(elapsed >= fml::TimeDelta::FromMilliseconds(800)); | ||
#else | ||
// Our batch time is 100ms. Hopefully the 500ms limit is relaxed enough to | ||
// make it not too flaky. | ||
ASSERT_TRUE(elapsed <= fml::TimeDelta::FromMilliseconds(500)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you could get what you want by just removing this assert right? That would eliminate all timeout except the one that's for the whole test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just realizing that's the whole point of this test. It doesn't really test much without it =T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting this LGTM, I'd double check with the author of it just to see if they can think of another way to assert what they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Author is @liyuqian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the more I think about this, this either needs to be a benchmark and run as one, or it needs to mock out the clock to assert how many ticks can pass before events are emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo pinging that test's author.
shell/common/shell_unittests.cc
Outdated
@@ -1430,7 +1379,7 @@ TEST_F(ShellTest, WaitForFirstFrameTimeout) { | |||
|
|||
RunEngine(shell.get(), std::move(configuration)); | |||
fml::Status result = | |||
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(10)); | |||
shell->WaitForFirstFrame(fml::TimeDelta::Zero()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the assert below right? It seems to be expecting a timeout indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We're saying we want to wait zero ms now instead of waiting 10. Theoretically something could actually produce a frame in 10ms. Even this isn't bullet proof but it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure that the intention is to time out as soon as possible, or to wait enough time? I wonder why 10 was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very likely because that was short enough on the developer's machine to trigger the timeout. The problem is, on a fast enough machine or with enough improvements to the code under test, 10ms might not be long enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOW, this test is currently trying to wait 10ms of monotonic time and seeing if it can or can't do work in that time. But that's susceptible to flakes.
@@ -1762,12 +1762,23 @@ fml::Status Shell::WaitForFirstFrame(fml::TimeDelta timeout) { | |||
"because it is responsible for generating the frame."); | |||
} | |||
|
|||
// Check for overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaaclarke - mind taking another look at this?
I had to change this because otherwise we're susceptible to overflows when adding timeout
to std::chrono::steady_clock::now()
.
wait_for
does that addition behind the scenes and I guess treats the overflow as UB. I'm using wait_until to avoid the steady clock advancing while we do other calcuations and overflowing even though we checked.
This is covered by tests now that the timeouts use max/zero values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is an actual bug for us today - if someone were to pass a very large value into here, they'd get a timeout response due to overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the API docs in the header file here https://github.com/flutter/engine/blob/master/shell/common/shell.h#L301 could use some more details or a warning about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think an easier fix would just be to say auto duration = min(max_duration, desired_duration);
and use wait_for. Since that time is so long, no one will ever be able to tell the difference in their lifetime. That removes a branch and is much easier to verify it is correct assuming what was there before was correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Though technically isn't std::min a branch? :) Anyway I agree the code is more readable/maintainable this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. We can't do that. wait_for
is still going to add to stead_clock::now()
, which has incremented since checked for overflow, and now might overflow again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this:
auto now = std::chrono::steady_clock::now();
auto max_duration = std::chrono::steady_clock::time_point::max() - now;
auto desired_duration = std::chrono::milliseconds(timeout.ToMilliseconds());
std::unique_lock<std::mutex> lock(waiting_for_first_frame_mutex_);
bool success = waiting_for_first_frame_condition_.wait_until(
lock, now + std::min(desired_duration, max_duration),
[&waiting_for_first_frame = waiting_for_first_frame_] {
return !waiting_for_first_frame.load();
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me too, as long as there isn't that branch I think it's more straightforward.
@@ -1762,12 +1762,23 @@ fml::Status Shell::WaitForFirstFrame(fml::TimeDelta timeout) { | |||
"because it is responsible for generating the frame."); | |||
} | |||
|
|||
// Check for overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the API docs in the header file here https://github.com/flutter/engine/blob/master/shell/common/shell.h#L301 could use some more details or a warning about this.
@@ -1410,8 +1357,7 @@ TEST_F(ShellTest, WaitForFirstFrameZeroSizeFrame) { | |||
|
|||
RunEngine(shell.get(), std::move(configuration)); | |||
PumpOneFrame(shell.get(), {1.0, 0.0, 0.0}); | |||
fml::Status result = | |||
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000)); | |||
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Zero()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be fml::TimeDelta::Max()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make us wait forever and time out the test. I'm not sure how much value this test adds over the other test(s) that make sure we timeout if we don't wait long enough.
The problem is, even if we keep it at 1s, there's no good way of knowing whether we timed out because it fails to produce a frame by design, or we timed out because we were running on hardware slow enough that 1s isn't long enough to produce a frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iskakaushik is in the blame for this, might have ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I missed the ASSERT below this line saying that the wait is expected to timeout. So, is this testing that PumpOneFrame
doesn't do anything for a zero size frame, and that's why the wait will time out? Is there another way for PumpOneFrame
to communicate that it isn't going to do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this test wants to assert that WaitForFirstFrame
will time out if you pump a zero sized frame.
I think a better place to test this would be e.g. in the animator unittests, making sure that OnAnimatorDidDraw
isn't called under this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that's where the shell would flip the wait condition)
* aac3658 Roll Skia from 14efdd3d50db to 3010f3d79193 (4 revisions) (flutter/engine#25857) * f45f52f Move path part files to libraries (flutter/engine#25842) * d96669f Roll Skia from 3010f3d79193 to 5276ba274b38 (1 revision) (flutter/engine#25859) * 43994ab Use "blur_sigma" instead of "blur_radius" in Shadow. (flutter/engine#25760) * 74ac010 Roll Dart SDK from 1e3e5efcd47e to 6397e8b91103 (18 revisions) (flutter/engine#25861) * d462cfd Remove flake inducing timeouts (flutter/engine#25847) * 258f63a Roll Skia from 5276ba274b38 to 31fddc376993 (14 revisions) (flutter/engine#25862) * 1bf01a1 fuchsia: Reliably forward View insets (flutter/engine#25381) * 2740062 Roll Skia from 31fddc376993 to 097263bb5089 (1 revision) (flutter/engine#25865) * 4aae6b3 Roll Dart SDK from 6397e8b91103 to ee8eb0a65efa (1 revision) (flutter/engine#25869) * d1f31c7 Roll Skia from 097263bb5089 to 5dfb3f40684b (1 revision) (flutter/engine#25870) * 428ef10 deflake (flutter/engine#25864) * 01b9db8 Roll Dart SDK from ee8eb0a65efa to 8c109a734bdc (1 revision) (flutter/engine#25872) * e623c09 Roll Dart SDK from 8c109a734bdc to b98a1eec5eb5 (1 revision) (flutter/engine#25873) * 7ce6226 Roll Dart SDK from b98a1eec5eb5 to 6ecae204598d (1 revision) (flutter/engine#25874) * 554c24c Roll Skia from 5dfb3f40684b to 5c95bcb48b9b (1 revision) (flutter/engine#25876) * 03bd2e3 Roll Dart SDK from 6ecae204598d to 3cc6cdab8eaf (2 revisions) (flutter/engine#25877) * 53fdd89 Roll Skia from 5c95bcb48b9b to c779d432f336 (1 revision) (flutter/engine#25880) * 98ee65b Roll Skia from c779d432f336 to ff8b52df55ff (2 revisions) (flutter/engine#25881) * d84caa7 Roll Dart SDK from 3cc6cdab8eaf to b8f4018535fa (2 revisions) (flutter/engine#25885) * 156760e Roll Skia from ff8b52df55ff to ec79349bad50 (1 revision) (flutter/engine#25886) * 23cd8e5 Revert the Dart SDK to 1e3e5efcd47edeb7ae5a69e146c8ea0559305a98 (flutter/engine#25887) * 578449f Fix crash when both FlutterFragmentActivity and FlutterFragment are destroyed and recreated (flutter/engine#25851)
Fixes flutter/flutter#64087
Fixes flutter/flutter#66265
Per style guide (https://github.com/flutter/flutter/wiki/Style-guide-for-Flutter-repo#never-check-if-a-port-is-available-before-using-it-never-add-timeouts-and-other-race-conditions), we shouldn't have timeouts.
This changes all short timeouts that were intended to cause a timeout to zero.
It changes all long timeouts that were intended to be good enough to
fml::TimeDelta::Max()
. On a sufficiently slow machine, 1s was not enough time to get the first frame drawn, for example. The test will still get killed by the harness if it exceeds the total time budget (currently at 90s).It also deletes one test entirely. This test is measuring how long the test itself takes to execute, and asserting that those times are greater or less than a certain number depending on the runtime mode. This is inherently flaky - it should not have been allowed in to begin with, and if we really need coverage of this it should be covered in some way that fakes out the clock. I assume, however, that this is covered by integration tests downstream.