-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] add additional setup method that caches more pipelines, warms internal shader code #50521
[Impeller] add additional setup method that caches more pipelines, warms internal shader code #50521
Conversation
…rms internal shader code.
@@ -604,4 +606,70 @@ void ContentContext::FlushCommandBuffers() const { | |||
GetContext()->GetCommandQueue()->Submit(buffers); | |||
} | |||
|
|||
void ContentContext::AdditionalSetup() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this method is private, but the name makes me die a little bit.
I like the first line of this method instead:
void ContentContext::InitializeCommonlyUsedShadersIfNeeded()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
GetClipPipeline(options); | ||
} | ||
|
||
std::string vendor = GetContext()->DescribeGpuModel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we imagine more changes like this in the future?
- Does it makes sense to have this in a backend-specific context thingy?
- Should there be a TODO with what we'd do if more of these cases are found later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
No, because the backend context doesn't have access to any of the shader caches, that is all in ContentContext. I could move the texture/rp creation stuff to context_vk, but then we'd have different setup methods in different places, and I kind of think its better to keep them together. I could see either case.
-
I dunno, that feels speculative. If we find more weird cases we'll figure it out then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this just as a comment and run the logic unconditionally?
It'll be difficult to keep track of where we have different vendor based driver hacks - if we're going to warm up like this why not just warm it up everywhere?
const RenderTarget& render_target) | ||
: RenderPass(context, render_target), delegate_(std::move(delegate)) {} | ||
|
||
// |RenderPass| | ||
void RecordingRenderPass::SetPipeline( | ||
const std::shared_ptr<Pipeline<PipelineDescriptor>>& pipeline) { | ||
pending_.pipeline = pipeline; | ||
delegate_->SetPipeline(pipeline); | ||
if (delegate_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this all conditional now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test helper, by making it conditional I can make delegating to a real commmand buffer optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes for the speculation around PSO variant construction which seems unrelated. The rest are nits or suggestions.
I concede we need a way to reason about things we don't understand about driver performance. And I don’t think we are done nearly enough thinking about how to handle such a need.
We should design this in a way that allows us to visualize a reasonable path towards getting rid of these hacks (henceforth referred to as voodoo) once our understanding of driver behavior evolves, or we apply fixes in other parts of the system. Put another way, if the person adding this voodoo were to go away, could someone else
- Reason about why it was added.
- Check if it is actually effective.
- Device systemic ways to make that setup unnecessary.
- Back it out.
As such, I believe the bar to add additional voodoo should be incredibly high.
As mentioned in the comments inline, the variant construction fails 1 and 2.
The texture upload and render pass construction voodoo passes 1 but currently probably fails 2. I’d like the need for this voodoo to be measurable. This could be as simple as all the voodoo being performed in a function that is decorated with a trace event so our tools can measure it later for effectiveness.
We can call the mechanism something like, VoodooManager
(or Shaman
) that performs invocations of the abstract base class IVoodoo
. This along with some affordance for measuring async voodoo. Make it sound absolutely ridiculous because it is doing something we actually don’t fully understand.
Each bit of voodoo can then have linked documentation and traces so someone can understand what is happening and back it out. Or not, depending on prioritization.
If we concede that we need a mechanism for performing this voodoo, we should also make it reside closer to the subsystems that need said voodoo. The render pass construction and texture upload voodoo should ideally just be in the vulkan backend in //impeller/renderer. Why does the entity framework care about this? Perhaps the context has a shaman that the different sub-systems can talk to. That way, only Vulkan can perform that voodoo and non-entity framework users can benefit too. I don’t think sub-systems above //impeller/renderer need to know how to read the device model.
I apologize if the additional process for adding voodoo seems tedious. Especially when there are pressures to ship things yesterday. But I believe a little bit of care in adding stuff here will go a long way in building a maintainable system. For instance, I didn’t understand the need for PSO construction, but I trust there is a good reason you added it. Even if I don’t know it. In the future I can totally see someone else cycling over all variants in ContentContext just to be safe. And then we run into similar problems.
@@ -604,4 +606,68 @@ void ContentContext::FlushCommandBuffers() const { | |||
GetContext()->GetCommandQueue()->Submit(buffers); | |||
} | |||
|
|||
void ContentContext::InitializeCommonlyUsedShadersIfNeeded() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For accuracy, CommonlyUsedPipelinesIfNeeded
since the shaders are already in their libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose even that isn't entirely accurate since the render pass is being initialized. lol, just DoOneTimeVoodoo
?
return; | ||
} | ||
|
||
// On ARM devices, the initial usage of vkCmdCopyBufferToImage has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't completely in the dark about the reasons as to why this is happening. And we haven't fully convinced ourselves this isn't user error yet.
.color_attachment_pixel_format = | ||
context_->GetCapabilities()->GetDefaultColorFormat()}; | ||
|
||
for (const auto mode : {BlendMode::kSource, BlendMode::kSourceOver}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the proceedings of the last go/impeller-weekly, my understanding was that PSO variant construction time did not show up in traces yet. What gives for the need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two traces posted are also unrelated to PSO variant construction and the linked bug makes no mention of PSO variants being slow to initialize. This bit seems unrelated to the linked issue or any problem we've discussed in the past.
I disagree with how you have framed the problem. This is not magic/vodo/hacks, instead this is the clearly observable behavior of an extremely common driver running on a specific platform that we care to support - meaning we must ensure that it is tested and performant. We are not targeting hypothetical or theoretical Vulkan devices.
The path for measuring changes that improve performance has and will always be benchmarks, and the mechanism by which we understand what the driver is doing is through CPU traces which I've posted above. The process for how we take those has been documented by myself for both Android and iOS (See https://github.com/flutter/engine/blob/main/impeller/docs/android_cpu_profile.md and https://github.com/flutter/engine/blob/main/impeller/docs/ios_cpu_profile.md ). We have numerous benchmarks in the framework repo that all report first frame times. I suspect that the first frame times reported in benchmarks will be improved by these changes, but I can certainly confirm that locally.
The process you are proposing here is not reasonable. Our code should be tested, yes. Our code should be benchmarked, yes. But we must take on the cost of maintaining good performance and behavior on the devices we support. I do not understand how your proposed process would help that. Instead, your proposal seems to me as if it would prevent us from ever adapting to the driver/platform and thus ensuring we have mediocre performance. |
@jonahwilliams @dnfield and I had a chat about this in person. The takeaways were:
Did I miss anything? |
Nope, that was accurate. I believe I have all of the appropriate tests moved over to capturing the first frame events, so i will look at this one now. |
First frame times are captured in several benchmarks, including: Gallery: Complex Layout: |
CreateIfNeeded(clip_pipelines_, options); | ||
} | ||
|
||
if (GetContext()->GetBackendType() == Context::BackendType::kVulkan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinmaygarde where did we land on making this vulkan only? IIRC we agreed not to do vendor checks since this code should be harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the Vulkan only check since it makes testing more of a pain.
Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change). If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review. |
Golden file changes are available for triage from new commit, Click here to view. |
…elines, warms internal shader code (flutter/engine#50521)
…elines, warms internal shader code (flutter/engine#50521)
…elines, warms internal shader code (flutter/engine#50521)
…elines, warms internal shader code (flutter/engine#50521)
…elines, warms internal shader code (flutter/engine#50521)
…elines, warms internal shader code (flutter/engine#50521)
…143652) flutter/engine@e51d4f1...c807aea 2024-02-17 [email protected] Roll Skia from bb61c2b4614e to 0d2dbf53aef6 (3 revisions) (flutter/engine#50744) 2024-02-17 [email protected] [Impeller] add additional setup method that caches more pipelines, warms internal shader code (flutter/engine#50521) 2024-02-17 [email protected] [Impeller] Assign subpass depth on restore rather than creation. (flutter/engine#50626) 2024-02-17 [email protected] Roll Dart SDK from fa66195a3814 to 6d659f880394 (1 revision) (flutter/engine#50739) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
@jonahwilliams did we ever get measurements on how With the way they are implemented now the ContentContext fires off tons of tasks asynchronously, then immediately synchronizes on a subset of them for I'm look at this again because of flutter/flutter#145843 and what Jia Hao showed us the other day. |
It does not synchronize on them, CreateIfNeeded does not call WaitAndGet. See the two drops in first frame time around here, the spike before was when i fixed the measurement. |
That's not what I'm seeing in a debugger:
|
That is calling wait and get on the pipeline prototype, not the newly created pipeline. Its possible that since those aren't done yet that we've added additional synchronization here we don't need, and instead we could thread it a bit more? |
We could also change this code to not depend on the prototype |
@jonahwilliams oh does "RenderPipelineT" mean "pipeline prototype"? I think we should probably rename that and/or add a docstring. Would "PipelinePrototype" be more appropriate? |
No that is just the pipeline type. I think you could make this whole thing non-blocking by just shoving it into a worker thread. That shoud be mostly safe. That said if you are interested in working on making startup faster this is not where I would squeeze. Do you wanna GVC for a bit? |
Part of flutter/flutter#138236
Fixes a number of issues related to startup performance by adding an "InitializeCommonlyUsedShadersIfNeeded" method to content context.
On the first frame of a flutter application renders, the backend will populate the glyph atlas for the first time. On the Vulkan backend, this executes vkCmdCopyBufferToImage. The first time this runs, Arm Mali drivers will generate an additional shader program. Creates a 1x1 texture and sets the contents to force this code to compile eagerly.
The first time a render pass is constructed, a shader is complied. This does not seem to depend on the properties of the render pass, so we just create a trivial one and submit it.
Finally there are a few missing shader variants. Lets just go ahead a populate that cache a bit more