You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/allegro.md
+46-1
Original file line number
Diff line number
Diff line change
@@ -19,10 +19,55 @@ The abstract from the paper is:
19
19
20
20
<Tip>
21
21
22
-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
22
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
23
23
24
24
</Tip>
25
25
26
+
## Quantization
27
+
28
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
29
+
30
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
31
+
32
+
```py
33
+
import torch
34
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
35
+
from diffusers.utils import export_to_video
36
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/aura_flow.md
+41-1
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# AuraFlow
14
14
15
-
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3.md) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
15
+
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
16
16
17
17
It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/).
18
18
@@ -22,6 +22,46 @@ AuraFlow can be quite expensive to run on consumer hardware devices. However, yo
22
22
23
23
</Tip>
24
24
25
+
## Quantization
26
+
27
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
28
+
29
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
30
+
31
+
```py
32
+
import torch
33
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline
34
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/cogvideox.md
+39-6
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
23
23
24
24
<Tip>
25
25
26
-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
27
27
28
28
</Tip>
29
29
@@ -112,13 +112,46 @@ CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds o
112
112
- With enabling cpu offloading and tiling, memory usage is `11 GB`
113
113
-`pipe.vae.enable_slicing()`
114
114
115
-
### Quantized inference
115
+
##Quantization
116
116
117
-
[torchao](https://github.com/pytorch/ao) and [optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be used to quantize the text encoder, transformer and VAE modules to lower the memory requirements. This makes it possible to run the model on a free-tier T4 Colab or lower VRAM GPUs!
117
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
118
118
119
-
It is also worth noting that torchao quantization is fully compatible with [torch.compile](/optimization/torch2.0#torchcompile), which allows for much faster inference speed. Additionally, models can be serialized and stored in a quantized datatype to save disk space with torchao. Find examples and benchmarks in the gists below.
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
120
+
121
+
```py
122
+
import torch
123
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, CogVideoXTransformer3DModel, CogVideoXPipeline
124
+
from diffusers.utils import export_to_video
125
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
prompt ="A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
152
+
video = pipeline(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/cogview3.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The abstract from the paper is:
23
23
24
24
<Tip>
25
25
26
-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
26
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/flux.md
+40
Original file line number
Diff line number
Diff line change
@@ -338,6 +338,46 @@ out = pipe(
338
338
out.save("image.png")
339
339
```
340
340
341
+
## Quantization
342
+
343
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
344
+
345
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
346
+
347
+
```py
348
+
import torch
349
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
350
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
## Single File Loading for the `FluxTransformer2DModel`
342
382
343
383
The `FluxTransformer2DModel` supports loading checkpoints in the original format shipped by Black Forest Labs. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/hunyuan_video.md
+31
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,37 @@ Recommendations for inference:
32
32
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `7.0` for HunyuanVideo.
33
33
- For more information about supported resolutions and other details, please refer to the original repository [here](https://github.com/Tencent/HunyuanVideo/).
34
34
35
+
## Quantization
36
+
37
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
38
+
39
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
40
+
41
+
```py
42
+
import torch
43
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/latte.md
+42-1
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ This pipeline was contributed by [maxin-cn](https://github.com/maxin-cn). The or
28
28
29
29
<Tip>
30
30
31
-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
31
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
32
32
33
33
</Tip>
34
34
@@ -70,6 +70,47 @@ Without torch.compile(): Average inference time: 16.246 seconds.
70
70
With torch.compile(): Average inference time: 14.573 seconds.
71
71
```
72
72
73
+
## Quantization
74
+
75
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
76
+
77
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LattePipeline`] for inference with bitsandbytes.
78
+
79
+
```py
80
+
import torch
81
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LatteTransformer3DModel, LattePipeline
82
+
from diffusers.utils import export_to_gif
83
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/ltx_video.md
+42-1
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@
18
18
19
19
<Tip>
20
20
21
-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
21
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
141
141
142
+
## Quantization
143
+
144
+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
145
+
146
+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LTXPipeline`] for inference with bitsandbytes.
147
+
148
+
```py
149
+
import torch
150
+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LTXVideoTransformer3DModel, LTXPipeline
151
+
from diffusers.utils import export_to_video
152
+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
prompt ="A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
179
+
video = pipeline(prompt=prompt, num_frames=161, num_inference_steps=50).frames[0]
0 commit comments