Fix image token mask in Gemma3 #38295
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
As per the title. It got fixed in #38080 in the middle of #37866. This PR corrects the additional mask. See images for correctness for both sliding and full cases.
Before #37866 however, the masks were first created with full attention, then were truncated with sliding window in the DecoderLayer, so I'm pretty sure the added positions to the mask were removed in the sliding layers (i.e. even with #38080 it was not correct)-> kind of similar issue as with the Qwens. This not the case anymore, and everything is correctly generated and passed along 🤗
Of course both squares corresponds to 2 images of 5 image tokens.