Add enable_vae_tiling to AllegroPipeline, fix example #10212

hlky · 2024-12-13T11:28:59Z

What does this PR do?

Allegro doesn't support VAE without tiling and is missing enable_vae_tiling etc on the pipeline.

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_allegro.py

Line 851 in cef0e36

    
           raise NotImplementedError("Encoding without tiling has not been implemented yet.")

Note that the default num_inference_steps=100 is very slow (~1h on A40, only slightly faster on A6000 Ada) so the example could have the value changed as well.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @yiyixuxu @DN6

HuggingFaceDocBuilderDev · 2024-12-13T11:35:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w

Thanks!

Regarding the number of inference steps being high, inference with 50 steps produces noisy videos. The default steps in the official repo is 100 as well. For faster inference, we could explore the different cache techniques soon.

hlky · 2024-12-13T11:55:24Z

@a-r-r-o-w Is lowering num_frames also known to produce noisy videos? I'm getting noisy video with num_inference_steps=100 and num_frames=22. I'm trying to test if outputs are changed with #10156 so I wanted to to be faster. Edit: this is on main.

import torch
from diffusers import AutoencoderKLAllegro, AllegroPipeline
from diffusers.utils import export_to_video
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
pipe = AllegroPipeline.from_pretrained("rhymes-ai/Allegro", vae=vae, torch_dtype=torch.bfloat16).to("cuda")
pipe.vae.enable_tiling()
prompt = (
    "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
    "the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
    "location might be a popular spot for docking fishing boats."
)
video = pipe(prompt, num_inference_steps=100, num_frames=22, guidance_scale=7.5, max_sequence_length=512, generator=torch.Generator().manual_seed(0)).frames[0]
export_to_video(video, "AllegroPipeline.mp4", fps=15)

AllegroPipeline.mp4

a-r-r-o-w · 2024-12-13T12:02:02Z

I believe that is expected. My memory is a bit hazy since it's been a while since we integrated it, but IIRC Allegro only works at 88 frames and specific resolutions (?).

I think it is okay to not run the full test if it's time consuming. What you could do instead is - save outputs of all intermediate transformers blocks and after decoding for say 2 inference steps, and compare that to your sincos/rope changes by looking at absmax of the intermediates from both runs. This should be a sufficient check imo

Add enable_vae_tiling to AllegroPipeline, fix example

c1559b5

sayakpaul requested review from stevhliu, DN6 and a-r-r-o-w December 13, 2024 11:31

a-r-r-o-w approved these changes Dec 13, 2024

View reviewed changes

hlky mentioned this pull request Dec 13, 2024

Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed #10156

Merged

yiyixuxu merged commit 7186bb4 into huggingface:main Dec 16, 2024
12 checks passed

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024

Add enable_vae_tiling to AllegroPipeline, fix example (#10212)

3d11ced

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add enable_vae_tiling to AllegroPipeline, fix example #10212

Add enable_vae_tiling to AllegroPipeline, fix example #10212

hlky commented Dec 13, 2024

HuggingFaceDocBuilderDev commented Dec 13, 2024

a-r-r-o-w left a comment

hlky commented Dec 13, 2024 •

edited

Loading

a-r-r-o-w commented Dec 13, 2024

Add enable_vae_tiling to AllegroPipeline, fix example #10212

Add enable_vae_tiling to AllegroPipeline, fix example #10212

Conversation

hlky commented Dec 13, 2024

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Dec 13, 2024

a-r-r-o-w left a comment

Choose a reason for hiding this comment

hlky commented Dec 13, 2024 • edited Loading

a-r-r-o-w commented Dec 13, 2024

hlky commented Dec 13, 2024 •

edited

Loading