Skip to content

Commit 0a05e01

Browse files
authored
Merge branch 'main' into fix-unloading-expanded-flux
2 parents 4b370d2 + fdcbbdf commit 0a05e01

33 files changed

+1051
-225
lines changed

docs/source/en/_toctree.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
- local: using-diffusers/inpaint
4949
title: Inpainting
5050
- local: using-diffusers/text-img2vid
51-
title: Text or image-to-video
51+
title: Video generation
5252
- local: using-diffusers/depth2img
5353
title: Depth-to-image
5454
title: Generative tasks

docs/source/en/api/pipelines/allegro.md

+46-1
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,55 @@ The abstract from the paper is:
1919

2020
<Tip>
2121

22-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
22+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2323

2424
</Tip>
2525

26+
## Quantization
27+
28+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
29+
30+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
31+
32+
```py
33+
import torch
34+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
35+
from diffusers.utils import export_to_video
36+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
37+
38+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
39+
text_encoder_8bit = T5EncoderModel.from_pretrained(
40+
"rhymes-ai/Allegro",
41+
subfolder="text_encoder",
42+
quantization_config=quant_config,
43+
torch_dtype=torch.float16,
44+
)
45+
46+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
47+
transformer_8bit = AllegroTransformer3DModel.from_pretrained(
48+
"rhymes-ai/Allegro",
49+
subfolder="transformer",
50+
quantization_config=quant_config,
51+
torch_dtype=torch.float16,
52+
)
53+
54+
pipeline = AllegroPipeline.from_pretrained(
55+
"rhymes-ai/Allegro",
56+
text_encoder=text_encoder_8bit,
57+
transformer=transformer_8bit,
58+
torch_dtype=torch.float16,
59+
device_map="balanced",
60+
)
61+
62+
prompt = (
63+
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
64+
"the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
65+
"location might be a popular spot for docking fishing boats."
66+
)
67+
video = pipeline(prompt, guidance_scale=7.5, max_sequence_length=512).frames[0]
68+
export_to_video(video, "harbor.mp4", fps=15)
69+
```
70+
2671
## AllegroPipeline
2772

2873
[[autodoc]] AllegroPipeline

docs/source/en/api/pipelines/aura_flow.md

+41-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# AuraFlow
1414

15-
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3.md) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
15+
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
1616

1717
It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/).
1818

@@ -22,6 +22,46 @@ AuraFlow can be quite expensive to run on consumer hardware devices. However, yo
2222

2323
</Tip>
2424

25+
## Quantization
26+
27+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
28+
29+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
30+
31+
```py
32+
import torch
33+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline
34+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
35+
36+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
37+
text_encoder_8bit = T5EncoderModel.from_pretrained(
38+
"fal/AuraFlow",
39+
subfolder="text_encoder",
40+
quantization_config=quant_config,
41+
torch_dtype=torch.float16,
42+
)
43+
44+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
45+
transformer_8bit = AuraFlowTransformer2DModel.from_pretrained(
46+
"fal/AuraFlow",
47+
subfolder="transformer",
48+
quantization_config=quant_config,
49+
torch_dtype=torch.float16,
50+
)
51+
52+
pipeline = AuraFlowPipeline.from_pretrained(
53+
"fal/AuraFlow",
54+
text_encoder=text_encoder_8bit,
55+
transformer=transformer_8bit,
56+
torch_dtype=torch.float16,
57+
device_map="balanced",
58+
)
59+
60+
prompt = "a tiny astronaut hatching from an egg on the moon"
61+
image = pipeline(prompt).images[0]
62+
image.save("auraflow.png")
63+
```
64+
2565
## AuraFlowPipeline
2666

2767
[[autodoc]] AuraFlowPipeline

docs/source/en/api/pipelines/cogvideox.md

+39-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
2323

2424
<Tip>
2525

26-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2727

2828
</Tip>
2929

@@ -112,13 +112,46 @@ CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds o
112112
- With enabling cpu offloading and tiling, memory usage is `11 GB`
113113
- `pipe.vae.enable_slicing()`
114114

115-
### Quantized inference
115+
## Quantization
116116

117-
[torchao](https://github.com/pytorch/ao) and [optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be used to quantize the text encoder, transformer and VAE modules to lower the memory requirements. This makes it possible to run the model on a free-tier T4 Colab or lower VRAM GPUs!
117+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
118118

119-
It is also worth noting that torchao quantization is fully compatible with [torch.compile](/optimization/torch2.0#torchcompile), which allows for much faster inference speed. Additionally, models can be serialized and stored in a quantized datatype to save disk space with torchao. Find examples and benchmarks in the gists below.
120-
- [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
121-
- [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
119+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
120+
121+
```py
122+
import torch
123+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, CogVideoXTransformer3DModel, CogVideoXPipeline
124+
from diffusers.utils import export_to_video
125+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
126+
127+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
128+
text_encoder_8bit = T5EncoderModel.from_pretrained(
129+
"THUDM/CogVideoX-2b",
130+
subfolder="text_encoder",
131+
quantization_config=quant_config,
132+
torch_dtype=torch.float16,
133+
)
134+
135+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
136+
transformer_8bit = CogVideoXTransformer3DModel.from_pretrained(
137+
"THUDM/CogVideoX-2b",
138+
subfolder="transformer",
139+
quantization_config=quant_config,
140+
torch_dtype=torch.float16,
141+
)
142+
143+
pipeline = CogVideoXPipeline.from_pretrained(
144+
"THUDM/CogVideoX-2b",
145+
text_encoder=text_encoder_8bit,
146+
transformer=transformer_8bit,
147+
torch_dtype=torch.float16,
148+
device_map="balanced",
149+
)
150+
151+
prompt = "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
152+
video = pipeline(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
153+
export_to_video(video, "ship.mp4", fps=8)
154+
```
122155

123156
## CogVideoXPipeline
124157

docs/source/en/api/pipelines/cogview3.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
2323

2424
<Tip>
2525

26-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2727

2828
</Tip>
2929

docs/source/en/api/pipelines/flux.md

+40
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,46 @@ out = pipe(
338338
out.save("image.png")
339339
```
340340

341+
## Quantization
342+
343+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
344+
345+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
346+
347+
```py
348+
import torch
349+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
350+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
351+
352+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
353+
text_encoder_8bit = T5EncoderModel.from_pretrained(
354+
"black-forest-labs/FLUX.1-dev",
355+
subfolder="text_encoder_2",
356+
quantization_config=quant_config,
357+
torch_dtype=torch.float16,
358+
)
359+
360+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
361+
transformer_8bit = FluxTransformer2DModel.from_pretrained(
362+
"black-forest-labs/FLUX.1-dev",
363+
subfolder="transformer",
364+
quantization_config=quant_config,
365+
torch_dtype=torch.float16,
366+
)
367+
368+
pipeline = FluxPipeline.from_pretrained(
369+
"black-forest-labs/FLUX.1-dev",
370+
text_encoder=text_encoder_8bit,
371+
transformer=transformer_8bit,
372+
torch_dtype=torch.float16,
373+
device_map="balanced",
374+
)
375+
376+
prompt = "a tiny astronaut hatching from an egg on the moon"
377+
image = pipeline(prompt, guidance_scale=3.5, height=768, width=1360, num_inference_steps=50).images[0]
378+
image.save("flux.png")
379+
```
380+
341381
## Single File Loading for the `FluxTransformer2DModel`
342382

343383
The `FluxTransformer2DModel` supports loading checkpoints in the original format shipped by Black Forest Labs. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.

docs/source/en/api/pipelines/hunyuan_video.md

+31
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,37 @@ Recommendations for inference:
3232
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `7.0` for HunyuanVideo.
3333
- For more information about supported resolutions and other details, please refer to the original repository [here](https://github.com/Tencent/HunyuanVideo/).
3434

35+
## Quantization
36+
37+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
38+
39+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
40+
41+
```py
42+
import torch
43+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
44+
from diffusers.utils import export_to_video
45+
46+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
47+
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
48+
"tencent/HunyuanVideo",
49+
subfolder="transformer",
50+
quantization_config=quant_config,
51+
torch_dtype=torch.float16,
52+
)
53+
54+
pipeline = HunyuanVideoPipeline.from_pretrained(
55+
"tencent/HunyuanVideo",
56+
transformer=transformer_8bit,
57+
torch_dtype=torch.float16,
58+
device_map="balanced",
59+
)
60+
61+
prompt = "A cat walks on the grass, realistic style."
62+
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
63+
export_to_video(video, "cat.mp4", fps=15)
64+
```
65+
3566
## HunyuanVideoPipeline
3667

3768
[[autodoc]] HunyuanVideoPipeline

docs/source/en/api/pipelines/latte.md

+42-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This pipeline was contributed by [maxin-cn](https://github.com/maxin-cn). The or
2828

2929
<Tip>
3030

31-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
31+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
3232

3333
</Tip>
3434

@@ -70,6 +70,47 @@ Without torch.compile(): Average inference time: 16.246 seconds.
7070
With torch.compile(): Average inference time: 14.573 seconds.
7171
```
7272

73+
## Quantization
74+
75+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
76+
77+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LattePipeline`] for inference with bitsandbytes.
78+
79+
```py
80+
import torch
81+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LatteTransformer3DModel, LattePipeline
82+
from diffusers.utils import export_to_gif
83+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
84+
85+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
86+
text_encoder_8bit = T5EncoderModel.from_pretrained(
87+
"maxin-cn/Latte-1",
88+
subfolder="text_encoder",
89+
quantization_config=quant_config,
90+
torch_dtype=torch.float16,
91+
)
92+
93+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
94+
transformer_8bit = LatteTransformer3DModel.from_pretrained(
95+
"maxin-cn/Latte-1",
96+
subfolder="transformer",
97+
quantization_config=quant_config,
98+
torch_dtype=torch.float16,
99+
)
100+
101+
pipeline = LattePipeline.from_pretrained(
102+
"maxin-cn/Latte-1",
103+
text_encoder=text_encoder_8bit,
104+
transformer=transformer_8bit,
105+
torch_dtype=torch.float16,
106+
device_map="balanced",
107+
)
108+
109+
prompt = "A small cactus with a happy face in the Sahara desert."
110+
video = pipeline(prompt).frames[0]
111+
export_to_gif(video, "latte.gif")
112+
```
113+
73114
## LattePipeline
74115

75116
[[autodoc]] LattePipeline

docs/source/en/api/pipelines/ltx_video.md

+42-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
<Tip>
2020

21-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
21+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
2222

2323
</Tip>
2424

@@ -139,6 +139,47 @@ export_to_video(video, "output.mp4", fps=24)
139139

140140
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
141141

142+
## Quantization
143+
144+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
145+
146+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LTXPipeline`] for inference with bitsandbytes.
147+
148+
```py
149+
import torch
150+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LTXVideoTransformer3DModel, LTXPipeline
151+
from diffusers.utils import export_to_video
152+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
153+
154+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
155+
text_encoder_8bit = T5EncoderModel.from_pretrained(
156+
"Lightricks/LTX-Video",
157+
subfolder="text_encoder",
158+
quantization_config=quant_config,
159+
torch_dtype=torch.float16,
160+
)
161+
162+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
163+
transformer_8bit = LTXVideoTransformer3DModel.from_pretrained(
164+
"Lightricks/LTX-Video",
165+
subfolder="transformer",
166+
quantization_config=quant_config,
167+
torch_dtype=torch.float16,
168+
)
169+
170+
pipeline = LTXPipeline.from_pretrained(
171+
"Lightricks/LTX-Video",
172+
text_encoder=text_encoder_8bit,
173+
transformer=transformer_8bit,
174+
torch_dtype=torch.float16,
175+
device_map="balanced",
176+
)
177+
178+
prompt = "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
179+
video = pipeline(prompt=prompt, num_frames=161, num_inference_steps=50).frames[0]
180+
export_to_video(video, "ship.mp4", fps=24)
181+
```
182+
142183
## LTXPipeline
143184

144185
[[autodoc]] LTXPipeline

0 commit comments

Comments
 (0)