Skip to content

Commit 904e3a4

Browse files
committed
Merge branch 'main' into original-lora-hunyuan-video
2 parents 893b9c0 + 8f2253c commit 904e3a4

File tree

89 files changed

+1869
-424
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+1869
-424
lines changed

Diff for: .github/workflows/nightly_tests.yml

+2
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,8 @@ jobs:
359359
test_location: "bnb"
360360
- backend: "gguf"
361361
test_location: "gguf"
362+
- backend: "torchao"
363+
test_location: "torchao"
362364
runs-on:
363365
group: aws-g6e-xlarge-plus
364366
container:

Diff for: docs/source/en/_toctree.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
- local: using-diffusers/inpaint
4949
title: Inpainting
5050
- local: using-diffusers/text-img2vid
51-
title: Text or image-to-video
51+
title: Video generation
5252
- local: using-diffusers/depth2img
5353
title: Depth-to-image
5454
title: Generative tasks

Diff for: docs/source/en/api/models/allegro_transformer3d.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AllegroTransformer3DModel
2020

21-
vae = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
21+
transformer = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
2222
```
2323

2424
## AllegroTransformer3DModel

Diff for: docs/source/en/api/models/cogvideox_transformer3d.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import CogVideoXTransformer3DModel
2020

21-
vae = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
21+
transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
2222
```
2323

2424
## CogVideoXTransformer3DModel

Diff for: docs/source/en/api/models/cogview3plus_transformer2d.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import CogView3PlusTransformer2DModel
2020

21-
vae = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
21+
transformer = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
2222
```
2323

2424
## CogView3PlusTransformer2DModel

Diff for: docs/source/en/api/models/mochi_transformer3d.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import MochiTransformer3DModel
2020

21-
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
21+
transformer = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
2222
```
2323

2424
## MochiTransformer3DModel

Diff for: docs/source/en/api/pipelines/allegro.md

+46-1
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,55 @@ The abstract from the paper is:
1919

2020
<Tip>
2121

22-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
22+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2323

2424
</Tip>
2525

26+
## Quantization
27+
28+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
29+
30+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
31+
32+
```py
33+
import torch
34+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
35+
from diffusers.utils import export_to_video
36+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
37+
38+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
39+
text_encoder_8bit = T5EncoderModel.from_pretrained(
40+
"rhymes-ai/Allegro",
41+
subfolder="text_encoder",
42+
quantization_config=quant_config,
43+
torch_dtype=torch.float16,
44+
)
45+
46+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
47+
transformer_8bit = AllegroTransformer3DModel.from_pretrained(
48+
"rhymes-ai/Allegro",
49+
subfolder="transformer",
50+
quantization_config=quant_config,
51+
torch_dtype=torch.float16,
52+
)
53+
54+
pipeline = AllegroPipeline.from_pretrained(
55+
"rhymes-ai/Allegro",
56+
text_encoder=text_encoder_8bit,
57+
transformer=transformer_8bit,
58+
torch_dtype=torch.float16,
59+
device_map="balanced",
60+
)
61+
62+
prompt = (
63+
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
64+
"the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
65+
"location might be a popular spot for docking fishing boats."
66+
)
67+
video = pipeline(prompt, guidance_scale=7.5, max_sequence_length=512).frames[0]
68+
export_to_video(video, "harbor.mp4", fps=15)
69+
```
70+
2671
## AllegroPipeline
2772

2873
[[autodoc]] AllegroPipeline

Diff for: docs/source/en/api/pipelines/animatediff.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -803,7 +803,7 @@ FreeInit is not really free - the improved quality comes at the cost of extra co
803803

804804
<Tip>
805805

806-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
806+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
807807

808808
</Tip>
809809

Diff for: docs/source/en/api/pipelines/attend_and_excite.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ You can find additional information about Attend-and-Excite on the [project page
2222

2323
<Tip>
2424

25-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
25+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2626

2727
</Tip>
2828

Diff for: docs/source/en/api/pipelines/audioldm.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ During inference:
3737

3838
<Tip>
3939

40-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
40+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
4141

4242
</Tip>
4343

Diff for: docs/source/en/api/pipelines/audioldm2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ The following example demonstrates how to construct good music and speech genera
6060

6161
<Tip>
6262

63-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
63+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
6464

6565
</Tip>
6666

Diff for: docs/source/en/api/pipelines/aura_flow.md

+41-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# AuraFlow
1414

15-
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3.md) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
15+
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
1616

1717
It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/).
1818

@@ -22,6 +22,46 @@ AuraFlow can be quite expensive to run on consumer hardware devices. However, yo
2222

2323
</Tip>
2424

25+
## Quantization
26+
27+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
28+
29+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
30+
31+
```py
32+
import torch
33+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline
34+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
35+
36+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
37+
text_encoder_8bit = T5EncoderModel.from_pretrained(
38+
"fal/AuraFlow",
39+
subfolder="text_encoder",
40+
quantization_config=quant_config,
41+
torch_dtype=torch.float16,
42+
)
43+
44+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
45+
transformer_8bit = AuraFlowTransformer2DModel.from_pretrained(
46+
"fal/AuraFlow",
47+
subfolder="transformer",
48+
quantization_config=quant_config,
49+
torch_dtype=torch.float16,
50+
)
51+
52+
pipeline = AuraFlowPipeline.from_pretrained(
53+
"fal/AuraFlow",
54+
text_encoder=text_encoder_8bit,
55+
transformer=transformer_8bit,
56+
torch_dtype=torch.float16,
57+
device_map="balanced",
58+
)
59+
60+
prompt = "a tiny astronaut hatching from an egg on the moon"
61+
image = pipeline(prompt).images[0]
62+
image.save("auraflow.png")
63+
```
64+
2565
## AuraFlowPipeline
2666

2767
[[autodoc]] AuraFlowPipeline

Diff for: docs/source/en/api/pipelines/blip_diffusion.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The original codebase can be found at [salesforce/LAVIS](https://github.com/sale
2525

2626
<Tip>
2727

28-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
28+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2929

3030
</Tip>
3131

Diff for: docs/source/en/api/pipelines/cogvideox.md

+39-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
2323

2424
<Tip>
2525

26-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2727

2828
</Tip>
2929

@@ -112,13 +112,46 @@ CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds o
112112
- With enabling cpu offloading and tiling, memory usage is `11 GB`
113113
- `pipe.vae.enable_slicing()`
114114

115-
### Quantized inference
115+
## Quantization
116116

117-
[torchao](https://github.com/pytorch/ao) and [optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be used to quantize the text encoder, transformer and VAE modules to lower the memory requirements. This makes it possible to run the model on a free-tier T4 Colab or lower VRAM GPUs!
117+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
118118

119-
It is also worth noting that torchao quantization is fully compatible with [torch.compile](/optimization/torch2.0#torchcompile), which allows for much faster inference speed. Additionally, models can be serialized and stored in a quantized datatype to save disk space with torchao. Find examples and benchmarks in the gists below.
120-
- [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
121-
- [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
119+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
120+
121+
```py
122+
import torch
123+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, CogVideoXTransformer3DModel, CogVideoXPipeline
124+
from diffusers.utils import export_to_video
125+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
126+
127+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
128+
text_encoder_8bit = T5EncoderModel.from_pretrained(
129+
"THUDM/CogVideoX-2b",
130+
subfolder="text_encoder",
131+
quantization_config=quant_config,
132+
torch_dtype=torch.float16,
133+
)
134+
135+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
136+
transformer_8bit = CogVideoXTransformer3DModel.from_pretrained(
137+
"THUDM/CogVideoX-2b",
138+
subfolder="transformer",
139+
quantization_config=quant_config,
140+
torch_dtype=torch.float16,
141+
)
142+
143+
pipeline = CogVideoXPipeline.from_pretrained(
144+
"THUDM/CogVideoX-2b",
145+
text_encoder=text_encoder_8bit,
146+
transformer=transformer_8bit,
147+
torch_dtype=torch.float16,
148+
device_map="balanced",
149+
)
150+
151+
prompt = "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
152+
video = pipeline(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
153+
export_to_video(video, "ship.mp4", fps=8)
154+
```
122155

123156
## CogVideoXPipeline
124157

Diff for: docs/source/en/api/pipelines/cogview3.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
2323

2424
<Tip>
2525

26-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2727

2828
</Tip>
2929

Diff for: docs/source/en/api/pipelines/controlnet.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The original codebase can be found at [lllyasviel/ControlNet](https://github.com
2626

2727
<Tip>
2828

29-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
3030

3131
</Tip>
3232

Diff for: docs/source/en/api/pipelines/controlnet_flux.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ XLabs ControlNets are also supported, which was contributed by the [XLabs team](
4242

4343
<Tip>
4444

45-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
45+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
4646

4747
</Tip>
4848

Diff for: docs/source/en/api/pipelines/controlnet_hunyuandit.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This code is implemented by Tencent Hunyuan Team. You can find pre-trained check
2626

2727
<Tip>
2828

29-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
3030

3131
</Tip>
3232

Diff for: docs/source/en/api/pipelines/controlnet_sd3.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ This controlnet code is mainly implemented by [The InstantX Team](https://huggin
3636

3737
<Tip>
3838

39-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
39+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
4040

4141
</Tip>
4242

0 commit comments

Comments
 (0)