Skip to content

wgpu should cache pipelines #7716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jimblandy opened this issue May 22, 2025 · 8 comments · May be fixed by #7729
Open

wgpu should cache pipelines #7716

jimblandy opened this issue May 22, 2025 · 8 comments · May be fixed by #7729
Assignees
Labels
area: performance How fast things go backend: dx12 Issues with DX12 or DXGI backend: vulkan Issues with Vulkan type: enhancement New feature or request

Comments

@jimblandy
Copy link
Member

wgpu is too slow when render or compute pipelines are created repeatedly.

At present, each call to wgpu_core::global::Global::device_create_compute_pipeline results in a separate call to wgpu_hal::Device::create_compute_pipeline. Render pipelines are similar. However, the WebGPU specification says (§2.2.4 User Agent State):

... It is expected that user agents will have compilation caches for the result of expensive compilation like GPUShaderModule, GPURenderPipeline and GPUComputePipeline.

This means that applications are within their rights to assume that calling createRenderPipeline in every animation frame, with the same parameters, should be cheap.

This Firefox profile shows that the Marching Cubes demo ends up pegging the Canvas Renderer thread running DXC. The profile has other problems, like taking 270ms for every requestAnimationFrame call, but the time in DXC is responsible for the pace of calls to requestAnimationFrame being only around 1fps.

@jimblandy
Copy link
Member Author

It seems like using wgpu_hal's existing pipeline cache feature would actually fix this for Vulkan. Even if you just create an empty pipeline cache, Vulkan says:

Pipeline cache objects allow the result of pipeline construction to be reused between pipelines and between runs of an application. Reuse between pipelines is achieved by passing the same pipeline cache object when creating multiple related pipelines. Reuse across runs of an application is achieved by retrieving pipeline cache contents in one run of an application, saving the contents, and using them to preinitialize a pipeline cache on a subsequent run.

The dx12 backend supplies only a dummy implementation. I don't know if Direct3D has anything that behaves the way Vulkan's pipeline cache objects do. cc @cwfitzgerald @magcius

@cwfitzgerald
Copy link
Member

There is ID3D12PipelineLibrary but what I heard this was focusing on serializing between runs not deduplicating within a run. D3D12 does have implicit pipeline caches, though our biggest cost on d3d12 is compiling HLSL -> DXIL/DXBC which is a user space operation, so we'd need to cache those.

@magcius
Copy link
Collaborator

magcius commented May 23, 2025 via email

@hakolao
Copy link
Contributor

hakolao commented May 23, 2025

I gotta ask, what is the use case for pipeline caching? I create mine once (or recreate on shader reload). Meaning, what is this caching used for if one shouldn't keep recreating pipelines each frame anyway?

Vulkan says: The big advantage of a pipeline cache is that the pipeline state can be saved to a file to be used between runs of an application

Is this something different? (Please explainer assume I know nothing)

@magcius
Copy link
Collaborator

magcius commented May 23, 2025 via email

@teoxoy
Copy link
Member

teoxoy commented May 23, 2025

One of the unity demos (https://vfx-demo.cds.unity3d.com) also stutters occasionally due to DxcCompiler::Compile being too slow (profile: https://share.firefox.dev/4kx1wq2). Looking at its API calls (trace.zip):

  • 13 calls to CreateComputePipeline
  • 163 calls to CreateRenderPipeline
  • 39 calls to CreateShaderModule
    • entry points in those modules:
      • 13 @compute
      • 12 @fragment
      • 14 @vertex

though our biggest cost on d3d12 is compiling HLSL -> DXIL/DXBC which is a user space operation, so we'd need to cache those.

I think some cache keyed off the shader module (and codegen-relevant state) would prevent a lot of duplicate work here.

I also think we can try caching just the bytecode (to avoid calling into FXC/DXC), that will probably get us far enough. Caching pipelines is more involved since we'd also need to cache everything else that makes up a pipeline.

@teoxoy
Copy link
Member

teoxoy commented May 23, 2025

Regarding the Marching Cubes demo, this is what it's calling every frame: https://github.com/tcoppex/webgpu-marchingcubes/blob/e464ccf192dcd9ded794dae5593a0e4cbedf487a/js/utils.js#L296

@jimblandy
Copy link
Member Author

P1 for Firefox because it blocks Marching Cubes.

@cwfitzgerald cwfitzgerald added type: enhancement New feature or request area: performance How fast things go backend: dx12 Issues with DX12 or DXGI backend: vulkan Issues with Vulkan labels May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance How fast things go backend: dx12 Issues with DX12 or DXGI backend: vulkan Issues with Vulkan type: enhancement New feature or request
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants