Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Remove flake inducing timeouts #25847

Merged
merged 8 commits into from
Apr 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions shell/common/shell.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1762,10 +1762,16 @@ fml::Status Shell::WaitForFirstFrame(fml::TimeDelta timeout) {
"because it is responsible for generating the frame.");
}

// Check for overflow.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaaclarke - mind taking another look at this?

I had to change this because otherwise we're susceptible to overflows when adding timeout to std::chrono::steady_clock::now().

wait_for does that addition behind the scenes and I guess treats the overflow as UB. I'm using wait_until to avoid the steady clock advancing while we do other calcuations and overflowing even though we checked.

This is covered by tests now that the timeouts use max/zero values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is an actual bug for us today - if someone were to pass a very large value into here, they'd get a timeout response due to overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the API docs in the header file here https://github.com/flutter/engine/blob/master/shell/common/shell.h#L301 could use some more details or a warning about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think an easier fix would just be to say auto duration = min(max_duration, desired_duration); and use wait_for. Since that time is so long, no one will ever be able to tell the difference in their lifetime. That removes a branch and is much easier to verify it is correct assuming what was there before was correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Though technically isn't std::min a branch? :) Anyway I agree the code is more readable/maintainable this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. We can't do that. wait_for is still going to add to stead_clock::now(), which has incremented since checked for overflow, and now might overflow again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this:

  auto now = std::chrono::steady_clock::now();
  auto max_duration = std::chrono::steady_clock::time_point::max() - now;
  auto desired_duration = std::chrono::milliseconds(timeout.ToMilliseconds());

  std::unique_lock<std::mutex> lock(waiting_for_first_frame_mutex_);
  bool success = waiting_for_first_frame_condition_.wait_until(
      lock, now + std::min(desired_duration, max_duration),
      [&waiting_for_first_frame = waiting_for_first_frame_] {
        return !waiting_for_first_frame.load();
      });

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me too, as long as there isn't that branch I think it's more straightforward.

auto now = std::chrono::steady_clock::now();
auto max_duration = std::chrono::steady_clock::time_point::max() - now;
auto desired_duration = std::chrono::milliseconds(timeout.ToMilliseconds());
auto duration =
now + (desired_duration > max_duration ? max_duration : desired_duration);

std::unique_lock<std::mutex> lock(waiting_for_first_frame_mutex_);
bool success = waiting_for_first_frame_condition_.wait_for(
lock, std::chrono::milliseconds(timeout.ToMilliseconds()),
[&waiting_for_first_frame = waiting_for_first_frame_] {
bool success = waiting_for_first_frame_condition_.wait_until(
lock, duration, [&waiting_for_first_frame = waiting_for_first_frame_] {
return !waiting_for_first_frame.load();
});
if (success) {
Expand Down
13 changes: 9 additions & 4 deletions shell/common/shell.h
Original file line number Diff line number Diff line change
Expand Up @@ -298,11 +298,16 @@ class Shell final : public PlatformView::Delegate,
bool base64_encode);

//----------------------------------------------------------------------------
/// @brief Pauses the calling thread until the first frame is presented.
/// @brief Pauses the calling thread until the first frame is presented.
///
/// @return 'kOk' when the first frame has been presented before the timeout
/// successfully, 'kFailedPrecondition' if called from the GPU or UI
/// thread, 'kDeadlineExceeded' if there is a timeout.
/// @param[in] timeout The duration to wait before timing out. If this
/// duration would cause an overflow when added to
/// std::chrono::steady_clock::now(), this method will
/// wait indefinitely for the first frame.
///
/// @return 'kOk' when the first frame has been presented before the
/// timeout successfully, 'kFailedPrecondition' if called from the
/// GPU or UI thread, 'kDeadlineExceeded' if there is a timeout.
///
fml::Status WaitForFirstFrame(fml::TimeDelta timeout);

Expand Down
97 changes: 20 additions & 77 deletions shell/common/shell_unittests.cc
Original file line number Diff line number Diff line change
Expand Up @@ -672,7 +672,7 @@ TEST_F(ShellTest, ExternalEmbedderNoThreadMerger) {
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -727,7 +727,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -779,7 +779,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand All @@ -788,9 +788,9 @@ TEST_F(ShellTest,

PumpOneFrame(shell.get(), 100, 100, builder);

auto result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000));
ASSERT_TRUE(result.ok());
auto result = shell->WaitForFirstFrame(fml::TimeDelta::Max());
ASSERT_TRUE(result.ok()) << "Result: " << static_cast<int>(result.code())
<< ": " << result.message();

ASSERT_TRUE(raster_thread_merger->IsEnabled());

Expand All @@ -800,7 +800,6 @@ TEST_F(ShellTest,
// Validate the platform view can be recreated and destroyed again
ValidateShell(shell.get());
ASSERT_TRUE(raster_thread_merger->IsEnabled());

DestroyShell(std::move(shell));
}

Expand Down Expand Up @@ -853,7 +852,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -928,7 +927,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -1001,7 +1000,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -1049,7 +1048,7 @@ TEST_F(ShellTest, OnPlatformViewDestroyWithoutRasterThreadMerger) {
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -1120,7 +1119,7 @@ TEST_F(ShellTest,
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -1263,57 +1262,6 @@ TEST(SettingsTest, FrameTimingSetsAndGetsProperly) {
}
}

#if FLUTTER_RELEASE
TEST_F(ShellTest, ReportTimingsIsCalledLaterInReleaseMode) {
#else
TEST_F(ShellTest, ReportTimingsIsCalledSoonerInNonReleaseMode) {
#endif
fml::TimePoint start = fml::TimePoint::Now();
auto settings = CreateSettingsForFixture();
std::unique_ptr<Shell> shell = CreateShell(settings);

// Create the surface needed by rasterizer
PlatformViewNotifyCreated(shell.get());

auto configuration = RunConfiguration::InferFromSettings(settings);
ASSERT_TRUE(configuration.IsValid());
configuration.SetEntrypoint("reportTimingsMain");

// Wait for 2 reports: the first one is the immediate callback of the first
// frame; the second one will exercise the batching logic.
fml::CountDownLatch reportLatch(2);
std::vector<int64_t> timestamps;
auto nativeTimingCallback = [&reportLatch,
&timestamps](Dart_NativeArguments args) {
Dart_Handle exception = nullptr;
timestamps = tonic::DartConverter<std::vector<int64_t>>::FromArguments(
args, 0, exception);
reportLatch.CountDown();
};
AddNativeCallback("NativeReportTimingsCallback",
CREATE_NATIVE_ENTRY(nativeTimingCallback));
RunEngine(shell.get(), std::move(configuration));

PumpOneFrame(shell.get());
PumpOneFrame(shell.get());

reportLatch.Wait();
DestroyShell(std::move(shell));

fml::TimePoint finish = fml::TimePoint::Now();
fml::TimeDelta elapsed = finish - start;

#if FLUTTER_RELEASE
// Our batch time is 1000ms. Hopefully the 800ms limit is relaxed enough to
// make it not too flaky.
ASSERT_TRUE(elapsed >= fml::TimeDelta::FromMilliseconds(800));
#else
// Our batch time is 100ms. Hopefully the 500ms limit is relaxed enough to
// make it not too flaky.
ASSERT_TRUE(elapsed <= fml::TimeDelta::FromMilliseconds(500));
#endif
Comment on lines -1306 to -1314
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you could get what you want by just removing this assert right? That would eliminate all timeout except the one that's for the whole test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just realizing that's the whole point of this test. It doesn't really test much without it =T

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting this LGTM, I'd double check with the author of it just to see if they can think of another way to assert what they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author is @liyuqian

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the more I think about this, this either needs to be a benchmark and run as one, or it needs to mock out the clock to assert how many ticks can pass before events are emitted.

}

TEST_F(ShellTest, ReportTimingsIsCalledImmediatelyAfterTheFirstFrame) {
auto settings = CreateSettingsForFixture();
std::unique_ptr<Shell> shell = CreateShell(settings);
Expand Down Expand Up @@ -1391,8 +1339,7 @@ TEST_F(ShellTest, WaitForFirstFrame) {

RunEngine(shell.get(), std::move(configuration));
PumpOneFrame(shell.get());
fml::Status result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000));
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Max());
ASSERT_TRUE(result.ok());

DestroyShell(std::move(shell));
Expand All @@ -1410,8 +1357,7 @@ TEST_F(ShellTest, WaitForFirstFrameZeroSizeFrame) {

RunEngine(shell.get(), std::move(configuration));
PumpOneFrame(shell.get(), {1.0, 0.0, 0.0});
fml::Status result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000));
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Zero());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be fml::TimeDelta::Max() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make us wait forever and time out the test. I'm not sure how much value this test adds over the other test(s) that make sure we timeout if we don't wait long enough.

The problem is, even if we keep it at 1s, there's no good way of knowing whether we timed out because it fails to produce a frame by design, or we timed out because we were running on hardware slow enough that 1s isn't long enough to produce a frame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iskakaushik is in the blame for this, might have ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I missed the ASSERT below this line saying that the wait is expected to timeout. So, is this testing that PumpOneFrame doesn't do anything for a zero size frame, and that's why the wait will time out? Is there another way for PumpOneFrame to communicate that it isn't going to do anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this test wants to assert that WaitForFirstFrame will time out if you pump a zero sized frame.

I think a better place to test this would be e.g. in the animator unittests, making sure that OnAnimatorDidDraw isn't called under this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that's where the shell would flip the wait condition)

ASSERT_FALSE(result.ok());
ASSERT_EQ(result.code(), fml::StatusCode::kDeadlineExceeded);

Expand All @@ -1429,8 +1375,7 @@ TEST_F(ShellTest, WaitForFirstFrameTimeout) {
configuration.SetEntrypoint("emptyMain");

RunEngine(shell.get(), std::move(configuration));
fml::Status result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(10));
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Zero());
ASSERT_FALSE(result.ok());
ASSERT_EQ(result.code(), fml::StatusCode::kDeadlineExceeded);

Expand All @@ -1449,11 +1394,10 @@ TEST_F(ShellTest, WaitForFirstFrameMultiple) {

RunEngine(shell.get(), std::move(configuration));
PumpOneFrame(shell.get());
fml::Status result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000));
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Max());
ASSERT_TRUE(result.ok());
for (int i = 0; i < 100; ++i) {
result = shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1));
result = shell->WaitForFirstFrame(fml::TimeDelta::Zero());
ASSERT_TRUE(result.ok());
}

Expand All @@ -1480,13 +1424,12 @@ TEST_F(ShellTest, WaitForFirstFrameInlined) {
PumpOneFrame(shell.get());
fml::AutoResetWaitableEvent event;
task_runner->PostTask([&shell, &event] {
fml::Status result =
shell->WaitForFirstFrame(fml::TimeDelta::FromMilliseconds(1000));
fml::Status result = shell->WaitForFirstFrame(fml::TimeDelta::Max());
ASSERT_FALSE(result.ok());
ASSERT_EQ(result.code(), fml::StatusCode::kFailedPrecondition);
event.Signal();
});
ASSERT_FALSE(event.WaitWithTimeout(fml::TimeDelta::FromMilliseconds(1000)));
ASSERT_FALSE(event.WaitWithTimeout(fml::TimeDelta::Max()));

DestroyShell(std::move(shell), std::move(task_runners));
}
Expand Down Expand Up @@ -1815,7 +1758,7 @@ TEST_F(ShellTest, Screenshot) {
SkPaint(SkColor4f::FromColor(SK_ColorRED)));
auto sk_picture = recorder.finishRecordingAsPicture();
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
this->GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
this->GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(10, 10),
flutter::SkiaGPUObject<SkPicture>({sk_picture, queue}), false, false);
Expand Down Expand Up @@ -2105,7 +2048,7 @@ TEST_F(ShellTest, OnServiceProtocolEstimateRasterCacheMemoryWorks) {
// 1. Construct a picture and a picture layer to be raster cached.
sk_sp<SkPicture> picture = MakeSizedPicture(10, 10);
fml::RefPtr<SkiaUnrefQueue> queue = fml::MakeRefCounted<SkiaUnrefQueue>(
GetCurrentTaskRunner(), fml::TimeDelta::FromSeconds(0));
GetCurrentTaskRunner(), fml::TimeDelta::Zero());
auto picture_layer = std::make_shared<PictureLayer>(
SkPoint::Make(0, 0),
flutter::SkiaGPUObject<SkPicture>({MakeSizedPicture(100, 100), queue}),
Expand Down