Skip to content

[ET-VK][ez] Make squeeze insertion requirements more strict #9950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Apr 7, 2025

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #9917 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/207/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/207/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/206/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/207/orig
@diff-train-skip-merge

Copy link

pytorch-bot bot commented Apr 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9950

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2025
Base automatically changed from gh/SS-JIA/206/orig to main April 7, 2025 22:54
Pull Request resolved: #9917

## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.
ghstack-source-id: 276566115
@exported-using-ghexport

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)
@kirklandsign kirklandsign added the release notes: vulkan Changes to the Vulkan backend delegate label Apr 7, 2025
@kirklandsign kirklandsign merged commit 1bef520 into main Apr 7, 2025
3 checks passed
@kirklandsign kirklandsign deleted the gh/SS-JIA/207/orig branch April 7, 2025 22:54
kirklandsign pushed a commit that referenced this pull request Apr 11, 2025
## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)
keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025
…9950)

## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see pytorch#8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: vulkan Changes to the Vulkan backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants