Enable ONNX export of GPU loaded SVD/SVD-XT UNet models #6562

rajeevsrao · 2024-01-13T07:51:01Z

What does this PR do?

Unpack num_frames scalar if created as a (CPU) tensor in forward path
Avoids mixed use of CPU and CUDA tensors which is unsupported by torch.nn ops

File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_spatio_temporal_condition.py", line 422, in forward
    emb = emb.repeat_interleave(num_frames, dim=0)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

rajeevsrao · 2024-01-13T07:51:24Z

@patrickvonplaten @sayakpaul please review.

sayakpaul · 2024-01-13T08:10:15Z

It should reside in optimum. Cc: @echarlaix

echarlaix · 2024-01-15T10:04:30Z

Hi @rajeevsrao, could you share the script you used for the export ?

echarlaix · 2024-01-15T10:07:08Z

It should reside in optimum. Cc: @echarlaix

You mean patching the model in optimum ? Depending on the modifications needed, it could make sense to have it in diffusers instead

rajeevsrao · 2024-01-22T23:56:15Z

It should reside in optimum. Cc: @echarlaix

You mean patching the model in optimum ? Depending on the modifications needed, it could make sense to have it in diffusers instead

ONNX is a popular interchange format. @sayakpaul I think that diffusers should also support exporting models into ONNX. Especially given that this is a easy/harmless fix.

rajeevsrao · 2024-01-22T23:57:01Z

Hi @rajeevsrao, could you share the script you used for the export ?

Here is the ONNX export script for reference

from diffusers.models import UNetSpatioTemporalConditionModel
import torch

model_name = "svd"
if model_name == "svd-xt":
    pipeline = 'stabilityai/stable-video-diffusion-img2vid-xt'
    num_frames = 25
else:
    pipeline = 'stabilityai/stable-video-diffusion-img2vid'
    num_frames = 14

device = 'cuda'
dtype = torch.float16
model = UNetSpatioTemporalConditionModel.from_pretrained(pipeline,
    subfolder="unet",
    use_safetensors=True,
    variant='fp16',
    torch_dtype=dtype).to(device)

batch_size = 2
out_channels = 4
cross_attention_dim = 1024
latent_height = 576 // 8
latent_width = 1024 // 8

input_names = ['sample', 'timestep', 'encoder_hidden_states', 'added_time_ids']
inputs = (
    torch.randn(batch_size, num_frames, 2*out_channels, latent_height, latent_width, dtype=dtype, device=device),
    torch.tensor([1.], dtype=torch.float32, device=device),
    torch.randn(batch_size, 1, cross_attention_dim, dtype=dtype, device=device),
    torch.randn(batch_size, 3, dtype=dtype, device=device),
)
output_names = ['latent']
dynamic_axes = {
    'sample': {0: '2B', 1: 'num_frames', 3: 'H', 4: 'W'},
    'encoder_hidden_states': {0: '2B'},
    'added_time_ids': {0: '2B'}
}

with torch.inference_mode(), torch.autocast(device):
    torch.onnx.export(model,
        inputs,
        model_name+"_unet.onnx",
        export_params=True,
        opset_version=18,
        do_constant_folding=True,
        input_names=input_names,
        output_names=output_names,
        dynamic_axes=dynamic_axes,
    )

* Unpack num_frames scalar if created as a (CPU) tensor in forward path Avoids mixed use of CPU and CUDA tensors which is unsupported by torch.nn ops Signed-off-by: Rajeev Rao <[email protected]>

rajeevsrao · 2024-01-25T17:09:32Z

@sayakpaul @echarlaix please suggest next steps. Thanks.

patrickvonplaten · 2024-01-26T12:24:29Z

src/diffusers/models/unets/unet_spatio_temporal_condition.py

@@ -397,6 +397,8 @@ def forward(

        # broadcast to batch dimension in a way that's compatible with ONNX/Core ML
        batch_size, num_frames = sample.shape[:2]
+        if torch.is_tensor(num_frames):


How can num_frames ever be a tensor? Can you give an example?

@patrickvonplaten num_frames is created as a CPU tensor during the tracing step of the ONNX export. I have also provided a script to reproduce this behavior in the comment below

github-actions · 2024-02-19T15:04:29Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-02-19T15:06:05Z

@rajeevsrao do you still plan to work on this?

HuggingFaceDocBuilderDev · 2024-02-19T15:12:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

asfiyab-nvidia · 2024-03-01T00:40:57Z

Reviving this PR.

The main issue observed is that the type of num_frames changes based on the operation performed.

Elaborating using 2 cases below. Please print the type(num_frames) here to investigate further

Case 1: Inference

During inference, the type of num_frames is <class 'int'>. Inference script used:

import torch
from diffusers.utils import load_image
from diffusers import StableVideoDiffusionPipeline
pipe = StableVideoDiffusionPipeline.from_pretrained('stabilityai/stable-video-diffusion-img2vid', torch_dtype=torch.float16, variant="fp16").to("cuda")
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))
frames = pipe(image, decode_chunk_size=8).frames[0]

Case 2: ONNX export

As the error specifically lies with the UNET, I'm exporting just the unet model using the script below. While tracing for the ONNX export, num_frames is created as a <class 'torch.Tensor'> on the CPU. Running the export on GPU results in the error from the description.

import torch
from diffusers.models import UNetSpatioTemporalConditionModel

dtype=torch.float16
device='cuda'
model = UNetSpatioTemporalConditionModel.from_pretrained('stabilityai/stable-video-diffusion-img2vid',
    subfolder="unet",
    use_safetensors=True,
    variant='fp16',
    torch_dtype=dtype).to(device)

batch_size = 2
num_frames=14
out_channels = 4
cross_attention_dim = 1024
latent_height = 576 // 8
latent_width = 1024 // 8

inputs = (
    torch.randn(batch_size, num_frames, 2*out_channels, latent_height, latent_width, dtype=dtype, device=device),
    torch.tensor([1.], dtype=torch.float32, device=device),
    torch.randn(batch_size, 1, cross_attention_dim, dtype=dtype, device=device),
    torch.randn(batch_size, 3, dtype=dtype, device=device),
)

with torch.inference_mode(), torch.autocast(device):
    torch.onnx.export(model,
        inputs,
        "svd/svd_unet.onnx",
        export_params=True,
        opset_version=18,
        do_constant_folding=True
    )

The PR aims to correct the inconsistency in the type of num_frames during inference and tracing

sayakpaul · 2024-03-01T01:00:48Z

Cc: @yiyixuxu. I am okay with the changes here since ONNX is very popular. LMK. @DN6, you too.

sayakpaul · 2024-03-01T01:01:50Z

@asfiyab-nvidia would you suggest anything being done differently in this PR?

asfiyab-nvidia · 2024-03-01T01:25:44Z

@asfiyab-nvidia would you suggest anything being done differently in this PR?

The main goal is to align the types. An alternative to the change suggested in the PR is to unconditionally cast the variable to a torch tensor.

num_frames = torch.tensor(num_frames).to(sample.device)

However, since the variable num_frames has usage in the context of being a scalar, I'd vote for the recommendation in the PR to cast to a scalar if found to be a tensor.

yiyixuxu · 2024-03-01T01:51:34Z

cc @echarlaix here again

I'm fine with the change if agreed it's the best way to support ONNX export

asfiyab-nvidia · 2024-03-07T18:54:49Z

Hi, following up on this PR.

sayakpaul · 2024-03-07T19:07:44Z

@echarlaix a gentle ping.

echarlaix · 2024-03-11T11:26:24Z

src/diffusers/models/unets/unet_spatio_temporal_condition.py

@@ -397,6 +397,8 @@ def forward(

        # broadcast to batch dimension in a way that's compatible with ONNX/Core ML
        batch_size, num_frames = sample.shape[:2]
+        if torch.is_tensor(num_frames):
+            num_frames = num_frames.item()


This will results in num_frames to be set as a constant in the ONNX graph, is this expected (I see that it's defined in the unet config) or is there any cases where this value might vary @yiyixuxu @sayakpaul ? If yes we should move the tensor to the expected device instead

No, num_frames can change as it's in the pipeline call.

diffusers/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py

Line 339 in a1cb106

num_frames: Optional[int] = None,

@echarlaix you're right, with the suggested change num_frames is traced as a constant and that is undesirable. Would the below change be more suitable?

if torch.is_tensor(num_frames): num_frames = num_frames.to(sample.device)

@echarlaix pinging to follow up on this. I ran the ONNX export of the UNET model using the above fix. The export runs successfully. However, the model fails ONNXRuntime Inference with the error below

ort_s = ort.InferenceSession(model_path) File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from svd/svd_unet.onnx failed:Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(float) and tensor(float16) in node (/up_blocks.3/attentions.2/time_mixer/Where).

I'd appreciate your input on how we can move this PR along. At the moment, the ONNX export for the SpatioTemporal UNET is broken. There are 2 ways to enable the export

Make num_frames a scalar. This results in num_frames being an ONNX constant and not changeable during inference.

Move num_frames to device during ONNX export. The ONNXRuntime Inference fails.

Option 1 sets num_frames to a Constant, but passes ONNXRuntime Inference - I'm leaning towards this as it results in immediate usability of the exported model. Happy to hear if you have alternative suggestions to the options suggested.

@echarlaix pinging to follow up on this. I ran the ONNX export of the UNET model using the above fix. The export runs successfully. However, the model fails ONNXRuntime Inference with the error below

Did you try to cast num_frames to a different data type ? Also were you able to locate where in the graph is this issue coming from ?

@echarlaix num_frames can only be an integer type as it is the repeats input to repeat_interleave. int64 and int32 are the only cast options and neither resolve the issue.

@echarlaix following up on this

asfiyab-nvidia · 2024-03-27T00:08:50Z

Hi @echarlaix @sayakpaul requesting updates based on the latest comments. Thanks

github-actions · 2024-05-03T15:05:32Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions · 2024-09-14T15:20:09Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LeoZDong · 2025-03-07T19:14:30Z

Could someone review this? Thanks!

sayakpaul · 2025-03-08T03:02:47Z

I think this will need to be reviewed by someone from the Optimum team. Cc: @echarlaix again

Enable ONNX export of GPU loaded SVD/SVD-XT UNet models

fcd4550

* Unpack num_frames scalar if created as a (CPU) tensor in forward path Avoids mixed use of CPU and CUDA tensors which is unsupported by torch.nn ops Signed-off-by: Rajeev Rao <[email protected]>

rajeevsrao force-pushed the main branch from 6af50ff to fcd4550 Compare January 23, 2024 17:08

patrickvonplaten reviewed Jan 26, 2024

View reviewed changes

github-actions bot added the stale Issues that haven't received updates label Feb 19, 2024

Merge branch 'main' into main

d46ecac

github-actions bot removed the stale Issues that haven't received updates label Mar 2, 2024

yiyixuxu added the ONNX label Mar 9, 2024

echarlaix reviewed Mar 11, 2024

View reviewed changes

github-actions bot added the stale Issues that haven't received updates label May 3, 2024

yiyixuxu removed the stale Issues that haven't received updates label May 3, 2024

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

a-r-r-o-w requested a review from sayakpaul March 7, 2025 19:17

sayakpaul removed the stale Issues that haven't received updates label Mar 8, 2025

sayakpaul requested a review from echarlaix March 8, 2025 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ONNX export of GPU loaded SVD/SVD-XT UNet models #6562

Enable ONNX export of GPU loaded SVD/SVD-XT UNet models #6562

rajeevsrao commented Jan 13, 2024 •

edited

Loading

rajeevsrao commented Jan 13, 2024

sayakpaul commented Jan 13, 2024

echarlaix commented Jan 15, 2024

echarlaix commented Jan 15, 2024

rajeevsrao commented Jan 22, 2024 •

edited

Loading

rajeevsrao commented Jan 22, 2024

rajeevsrao commented Jan 25, 2024

patrickvonplaten Jan 26, 2024

asfiyab-nvidia Mar 1, 2024

github-actions bot commented Feb 19, 2024

sayakpaul commented Feb 19, 2024

HuggingFaceDocBuilderDev commented Feb 19, 2024

asfiyab-nvidia commented Mar 1, 2024

sayakpaul commented Mar 1, 2024

sayakpaul commented Mar 1, 2024

asfiyab-nvidia commented Mar 1, 2024

yiyixuxu commented Mar 1, 2024

asfiyab-nvidia commented Mar 7, 2024

sayakpaul commented Mar 7, 2024

echarlaix Mar 11, 2024

sayakpaul Mar 11, 2024 •

edited

Loading

asfiyab-nvidia Mar 18, 2024

asfiyab-nvidia Mar 23, 2024

echarlaix Mar 27, 2024 •

edited

Loading

asfiyab-nvidia Apr 2, 2024

asfiyab-nvidia Apr 9, 2024

asfiyab-nvidia commented Mar 27, 2024

github-actions bot commented May 3, 2024

github-actions bot commented Sep 14, 2024

LeoZDong commented Mar 7, 2025

sayakpaul commented Mar 8, 2025

Enable ONNX export of GPU loaded SVD/SVD-XT UNet models #6562

Are you sure you want to change the base?

Enable ONNX export of GPU loaded SVD/SVD-XT UNet models #6562

Conversation

rajeevsrao commented Jan 13, 2024 • edited Loading

What does this PR do?

rajeevsrao commented Jan 13, 2024

sayakpaul commented Jan 13, 2024

echarlaix commented Jan 15, 2024

echarlaix commented Jan 15, 2024

rajeevsrao commented Jan 22, 2024 • edited Loading

rajeevsrao commented Jan 22, 2024

rajeevsrao commented Jan 25, 2024

patrickvonplaten Jan 26, 2024

Choose a reason for hiding this comment

asfiyab-nvidia Mar 1, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 19, 2024

sayakpaul commented Feb 19, 2024

HuggingFaceDocBuilderDev commented Feb 19, 2024

asfiyab-nvidia commented Mar 1, 2024

Case 1: Inference

Case 2: ONNX export

sayakpaul commented Mar 1, 2024

sayakpaul commented Mar 1, 2024

asfiyab-nvidia commented Mar 1, 2024

yiyixuxu commented Mar 1, 2024

asfiyab-nvidia commented Mar 7, 2024

sayakpaul commented Mar 7, 2024

echarlaix Mar 11, 2024

Choose a reason for hiding this comment

sayakpaul Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

asfiyab-nvidia Mar 18, 2024

Choose a reason for hiding this comment

asfiyab-nvidia Mar 23, 2024

Choose a reason for hiding this comment

echarlaix Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

asfiyab-nvidia Apr 2, 2024

Choose a reason for hiding this comment

asfiyab-nvidia Apr 9, 2024

Choose a reason for hiding this comment

asfiyab-nvidia commented Mar 27, 2024

github-actions bot commented May 3, 2024

github-actions bot commented Sep 14, 2024

LeoZDong commented Mar 7, 2025

sayakpaul commented Mar 8, 2025

rajeevsrao commented Jan 13, 2024 •

edited

Loading

rajeevsrao commented Jan 22, 2024 •

edited

Loading

sayakpaul Mar 11, 2024 •

edited

Loading

echarlaix Mar 27, 2024 •

edited

Loading