[Flax] added memory efficient attention for U-net #2231

MuhHanif · 2023-02-03T09:19:21Z

memory efficient attention implementation for stable diffusion flax, enabling higher batch count / resolution during training or fine tuning.
enable it by adding argument from_pretrained(use_memory_efficient=True) when loading the U-net model

HuggingFaceDocBuilderDev · 2023-02-03T09:23:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

MuhHanif · 2023-02-03T09:36:39Z

uhh a little help for code quality checks

patrickvonplaten · 2023-02-03T17:39:19Z

That's very cool! @pcuenca do you maybe have some time to look into it? :-)

patrickvonplaten · 2023-02-13T11:35:17Z

@pcuenca do you think you maybe have some time to review this PR? :-)

pcuenca

Sorry for the delay. I have been testing this PR (for inference) on a TPU v3-8, and found that speed is a bit slower but batch sizes can be larger. The use of memory-efficient attention is not as beneficial as in the PyTorch world, because jit and pmap already do a great job by default. In fact, I believe that batch sizes could arguably be made larger without memory-efficient attention by partitioning model and data differently. Still, this could be interesting for some use cases.

These are my results:

	Diffusers `main`		This PR
Total Batch Size	Compile	Inference	Compile	Inference
8	1m 25s	2.47	1m 39s	3.61
32	1m 55s	7.09	2m 07s	10.7
64	2m 13s	13.8	2m 21s	20.2
160	3m 31s	34.4	3m 34s	34.5
192	OOM		4m 13s	59.2
256	OOM		4m 53s	78.0

@MuhHanif Are these results consistent with what you have observed?

pcuenca

Thanks a lot! As stated before, this could be useful for some use cases. The code works and it looks fine to me, I just have a few suggestions:

Rename some symbols for consistency with previous naming conventions in the codebase.
Would it be possible to enable memory efficient attention after the pipeline has been loaded? We could use something like unet.enable_jax_memory_efficient_attention().
Would it be possible to add a couple of simple tests?

I'm curious about the way you are planning to use this feature. Have you measured any training figures by any chance? If so, it could be interesting to add some comments about this feature in the training docs (I can help with that).

Please, let us know if these comments make sense to you. Once again, sorry for being late to review.

(Also, please note that we recently updated our quality/style tooling, let me know if you need help making style checks pass).

pcuenca · 2023-02-27T18:11:00Z

src/diffusers/models/attention_flax.py

@@ -31,13 +32,16 @@ class FlaxAttentionBlock(nn.Module):
            Dropout rate
        dtype (:obj:`jnp.dtype`, *optional*, defaults to jnp.float32):
            Parameters `dtype`
+        use_memory_efficient (`bool`, *optional*, defaults to `False`):


Suggested change

use_memory_efficient (`bool`, *optional*, defaults to `False`):

use_memory_efficient_attention (`bool`, *optional*, defaults to `False`):

Can we rename it to use_memory_efficient_attention everywhere?

pcuenca · 2023-02-27T18:12:00Z

src/diffusers/models/attention_flax.py


    """
    query_dim: int
    heads: int = 8
    dim_head: int = 64
    dropout: float = 0.0
    dtype: jnp.dtype = jnp.float32
+    use_memory_efficient: bool = False


Suggested change

use_memory_efficient: bool = False

use_memory_efficient_attention: bool = False

pcuenca · 2023-02-27T18:15:59Z

src/diffusers/models/memory_efficient_attention_jax.py

+    key_chunk_size = min(key_chunk_size, num_kv)
+    query = query / jnp.sqrt(k_features)
+
+    @functools.partial(jax.checkpoint, prevent_cse=False)


Interesting. I'd be curious to know what's the impact for training. Why do we need to disable prevent_cse?

Interesting. I'd be curious to know what's the impact for training. Why do we need to disable prevent_cse?

idk, I just take the code snippet from Self-attention Does Not Need O(n2) Memory and inspiration from AminRezaei0x443. I could test it with prevent_cse enabled and see what happens.

pcuenca · 2023-02-27T18:17:12Z

src/diffusers/models/memory_efficient_attention_jax.py

+
+    return all_values / all_weights
+
+def memory_efficient_attention(


Maybe rename to jax_memory_efficient_attention for symmetry with xformers_memory_efficient_attention

MuhHanif · 2023-03-04T04:16:02Z

Sorry for the delay. I have been testing this PR (for inference) on a TPU v3-8, and found that speed is a bit slower but batch sizes can be larger. The use of memory-efficient attention is not as beneficial as in the PyTorch world, because jit and pmap already do a great job by default. In fact, I believe that batch sizes could arguably be made larger without memory-efficient attention by partitioning model and data differently. Still, this could be interesting for some use cases.

These are my results:
Diffusers main This PR
Total Batch Size Compile Inference Compile Inference
8 1m 25s 2.47 1m 39s 3.61
32 1m 55s 7.09 2m 07s 10.7
64 2m 13s 13.8 2m 21s 20.2
160 3m 31s 34.4 3m 34s 34.5
192 OOM 4m 13s 59.2
256 OOM 4m 53s 78.0

@MuhHanif Are these results consistent with what you have observed?

Hi @pcuenca ! sorry for late reply,

yes it's slower roughly 10-20% if using memory efficient attention during training (I didn't profile it thoroughly tho).
and the slow down is expected because I chunk the self attention query matrix to be as small as the center most layer of the U-Net.
but it makes training with varying resolution size (NAi resolution bucketing) for each batch possible ie: ( 512x512, 512x704, etc) without OOM

MuhHanif · 2023-03-04T04:30:34Z

* Rename some symbols for consistency with previous naming conventions in the codebase.

okay

* Would it be possible to enable memory efficient attention after the pipeline has been loaded? We could use something like `unet.enable_jax_memory_efficient_attention()`.

it should be possible with unet.use_memory_efficient = True

* Would it be possible to add a couple of simple tests?

what kind of test should i add?

github-actions · 2023-03-28T15:03:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2023-03-28T16:27:09Z

Gentle ping here @pcuenca

patrickvonplaten requested a review from pcuenca February 3, 2023 17:39

patrickvonplaten assigned pcuenca Feb 3, 2023

pcuenca reviewed Feb 27, 2023

View reviewed changes

keturn mentioned this pull request Mar 2, 2023

Memory-efficient attention (without xformers) #1892

Open

github-actions bot added the stale Issues that haven't received updates label Mar 28, 2023

pcuenca mentioned this pull request Mar 29, 2023

Flax memory efficient attention #2889

Merged

MuhHanif closed this Apr 10, 2023

MuhHanif force-pushed the main branch from a4ff0b7 to dcfa6e1 Compare April 10, 2023 04:15

jonatanklosko mentioned this pull request Dec 12, 2023

Refactor attention implementation elixir-nx/bumblebee#300

Merged

	use_memory_efficient (`bool`, optional, defaults to `False`):
	use_memory_efficient_attention (`bool`, optional, defaults to `False`):

	use_memory_efficient: bool = False
	use_memory_efficient_attention: bool = False


		return all_values / all_weights

		def memory_efficient_attention(

[Flax] added memory efficient attention for U-net #2231

[Flax] added memory efficient attention for U-net #2231

Uh oh!

Conversation

MuhHanif commented Feb 3, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2023

Uh oh!

MuhHanif commented Feb 3, 2023

Uh oh!

patrickvonplaten commented Feb 3, 2023

Uh oh!

patrickvonplaten commented Feb 13, 2023

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

MuhHanif Mar 4, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

MuhHanif commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MuhHanif commented Mar 4, 2023

Uh oh!

github-actions bot commented Mar 28, 2023

Uh oh!

patrickvonplaten commented Mar 28, 2023

Uh oh!

Uh oh!

pcuenca left a comment •

edited

Loading

MuhHanif commented Mar 4, 2023 •

edited

Loading