T5Attention support for cross-attention #2654

kashif · 2023-03-13T14:56:01Z

Added support for implementing T5Attention to processors. Needed for #1044

Tested on pytorch 2.0 RC

Fix use of AttnProcessor2_0 for cross attention with mask

HuggingFaceDocBuilderDev · 2023-03-13T15:00:30Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-03-13T15:09:09Z

Can you give a bit more background on what issue is fixed here? I'm not so sure about this tbh.
This Torch 2.0 processor should exactly correspond to:

diffusers/src/diffusers/models/cross_attention.py

Line 298 in 6766a81

batch_size, sequence_length, _ = hidden_states.shape

IMO

kashif · 2023-03-13T15:41:54Z

Of course!! my bad!

The issue is that the shape of the mask returned by prepare_attention_mask depends on the sequence_length which is currently wrong in the cross-attention case when the length of the encoder_hidden_states differ from the length of the hidden_states

With this change at least the function doesn't complain... however the outputs vs. CrossAttnProcessor still seem to differ so i am trying to debug that and potentially add tests

Birch-san · 2023-03-13T23:10:10Z

@kashif worth knowing that I'm also working on support for masks, particularly supporting them for cross-attention:
#2634

probably mine will need a rewrite.

I think the idea in your PR is correct, that the length of the sequence needs to be based on the key. I did the same thing in my PR.

kashif · 2023-03-14T08:05:00Z

thanks @Birch-san i am happy to close this in view of your PR. I also need to add two extra flags for scale and bias_out which I need for the T5Attention implementation. Should I just contribute to your PR?

src/diffusers/models/cross_attention.py

patrickvonplaten · 2023-03-14T20:39:25Z

Cool, this works thanks a lot for making the changes @kashif !

patrickvonplaten · 2023-03-14T20:39:46Z

@Birch-san - think we could adapt your PR after this quite easily no?

patrickvonplaten · 2023-03-14T20:43:07Z

src/diffusers/models/cross_attention.py

        if processor is None:
-            processor = AttnProcessor2_0() if hasattr(F, "scaled_dot_product_attention") else CrossAttnProcessor()
+            if torch.torch_version.TorchVersion(torch.__version__) >= (2, 1, 0):


Let's revert this, we don't need 2.1, 2.0 is enough and I think the logic before was good

right but then the scaled_dot_product_attention in 2.0 has no scale which is what i would need... but yes i can deal with that in the pipeline?

Ah I see, ok I think it's fine if Torch 2.0 doesn't work yet for the spectrogram model. Let's maybe just advertise it with the previous PyTorch version and see if the community tries it out on Pytorch 2.0

ok cool! reverting... i can deal with it or i can also check if attn.scale == 1 and not do this... which is only for spectrogram for now?

patrickvonplaten · 2023-03-14T20:43:39Z

src/diffusers/models/cross_attention.py

@@ -497,7 +511,7 @@ def __call__(self, attn: CrossAttention, hidden_states, encoder_hidden_states=No

        # the output of sdp = (batch, num_heads, seq_len, head_dim)
        hidden_states = F.scaled_dot_product_attention(
-            query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
+            query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False, scale=attn.scale


Is this backwards compatible?

yes since if scale=None, the default scale is used ie. the 1/sqrt(D) but only works in 2.1 nightly

patrickvonplaten · 2023-03-14T20:47:41Z

@kashif can you also run all the slow tests for:

Stable Diffusion
Stable Diffusion 2
UnCLIP

So that we can be sure that nothing is broken

kashif · 2023-03-14T20:50:12Z

ok sure reverting and running slow tests... give me a few!

kashif · 2023-03-14T22:57:25Z

ran slow tests... all failures are of this example:

FAILED tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py::StableDiffusionLatentUpscalePipelineFastTests::test_attention_slicing_forward_pass - AssertionError: 0.003142655 not less than 0.001 : Attention slicing should not affect the inference results
FAILED tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py::StableDiffusionLatentUpscalePipelineFastTests::test_inference_batch_single_identical - assert 0.004434526 < 0.0001
FAILED tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py::StableDiffusionLatentUpscalePipelineFastTests::test_xformers_attention_forwardGenerator_pass - AssertionError: 0.0036953688 not less than 0.0001 : XFormers attention should not affect the inference results
FAILED tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py::StableDiffusionLatentUpscalePipelineIntegrationTests::test_latent_upscaler_fp16 - AssertionError: assert 0.5175781 < 0.5

patrickvonplaten

Very cool! Thanks for the PR @kashif :-)

kashif · 2023-03-15T17:06:24Z

ok, thanks! will add fast tests to spectrogram diffusion!

* fix AttnProcessor2_0 Fix use of AttnProcessor2_0 for cross attention with mask * added scale_qk and out_bias flags * fixed for xformers * check if it has scale argument * Update cross_attention.py * check torch version * fix sliced attn * style * set scale * fix test * fixed addedKV processor * revert back AttnProcessor2_0 * if missing if * fix inner_dim --------- Co-authored-by: Patrick von Platen <[email protected]>

fix AttnProcessor2_0

15321c7

Fix use of AttnProcessor2_0 for cross attention with mask

kashif added 5 commits March 13, 2023 20:07

added scale_qk and out_bias flags

70e7c4e

fixed for xformers

0863561

check if it has scale argument

8cdaf3f

Update cross_attention.py

840af0a

check torch version

19861b9

kashif added 3 commits March 14, 2023 09:29

fix sliced attn

afc92c4

Merge branch 'main' into fix-AttnProcessor2_0

20e29db

style

d044d2b

kashif changed the title ~~Fix AttnProcessor2_0~~ T5Attention support for cross-attention Mar 14, 2023

kashif added 3 commits March 14, 2023 09:46

set scale

836bc8a

fix test

9631fa2

fixed addedKV processor

6e77ced

patrickvonplaten reviewed Mar 14, 2023

View reviewed changes

src/diffusers/models/cross_attention.py Show resolved Hide resolved

Merge branch 'main' into fix-AttnProcessor2_0

2bf2998

patrickvonplaten reviewed Mar 14, 2023

View reviewed changes

kashif added 4 commits March 14, 2023 21:50

Merge branch 'main' into fix-AttnProcessor2_0

e0c8955

revert back AttnProcessor2_0

0a96374

if missing if

afaa112

fix inner_dim

9e34860

Merge branch 'main' into fix-AttnProcessor2_0

f75697f

patrickvonplaten approved these changes Mar 15, 2023

View reviewed changes

patrickvonplaten merged commit cf4227c into huggingface:main Mar 15, 2023

kashif deleted the fix-AttnProcessor2_0 branch March 15, 2023 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5Attention support for cross-attention #2654

T5Attention support for cross-attention #2654

kashif commented Mar 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

Loading

patrickvonplaten commented Mar 13, 2023

kashif commented Mar 13, 2023

Birch-san commented Mar 13, 2023

kashif commented Mar 14, 2023

patrickvonplaten commented Mar 14, 2023

patrickvonplaten commented Mar 14, 2023

patrickvonplaten Mar 14, 2023

kashif Mar 14, 2023

patrickvonplaten Mar 14, 2023

kashif Mar 14, 2023

patrickvonplaten Mar 14, 2023

kashif Mar 14, 2023

patrickvonplaten commented Mar 14, 2023

kashif commented Mar 14, 2023

kashif commented Mar 14, 2023

patrickvonplaten left a comment

kashif commented Mar 15, 2023

T5Attention support for cross-attention #2654

T5Attention support for cross-attention #2654

Conversation

kashif commented Mar 13, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Mar 13, 2023 • edited Loading

patrickvonplaten commented Mar 13, 2023

kashif commented Mar 13, 2023

Birch-san commented Mar 13, 2023

kashif commented Mar 14, 2023

patrickvonplaten commented Mar 14, 2023

patrickvonplaten commented Mar 14, 2023

patrickvonplaten Mar 14, 2023

Choose a reason for hiding this comment

kashif Mar 14, 2023

Choose a reason for hiding this comment

patrickvonplaten Mar 14, 2023

Choose a reason for hiding this comment

kashif Mar 14, 2023

Choose a reason for hiding this comment

patrickvonplaten Mar 14, 2023

Choose a reason for hiding this comment

kashif Mar 14, 2023

Choose a reason for hiding this comment

patrickvonplaten commented Mar 14, 2023

kashif commented Mar 14, 2023

kashif commented Mar 14, 2023

patrickvonplaten left a comment

Choose a reason for hiding this comment

kashif commented Mar 15, 2023

kashif commented Mar 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

Loading