Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
T5Attention support for cross-attention #2654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5Attention support for cross-attention #2654
Changes from 12 commits
15321c7
70e7c4e
0863561
8cdaf3f
840af0a
19861b9
afc92c4
20e29db
d044d2b
836bc8a
9631fa2
6e77ced
2bf2998
e0c8955
0a96374
afaa112
9e34860
f75697f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revert this, we don't need 2.1, 2.0 is enough and I think the logic before was good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but then the
scaled_dot_product_attention
in 2.0 has noscale
which is what i would need... but yes i can deal with that in the pipeline?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, ok I think it's fine if Torch 2.0 doesn't work yet for the spectrogram model. Let's maybe just advertise it with the previous PyTorch version and see if the community tries it out on Pytorch 2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok cool! reverting... i can deal with it or i can also check if
attn.scale == 1
and not do this... which is only for spectrogram for now?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes since if
scale=None
, the default scale is used ie. the 1/sqrt(D) but only works in 2.1 nightly