-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bugfix] Adjust mllama to regional compilation #15112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Adjust mllama to regional compilation #15112
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]>
2f5b299
to
49cd230
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your contribution.
Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]>
This PR involves cherry-pick of vllm-project#15112 from the upstream and a fix for cos_sin preparation in emb layers to match regional compilation. --------- Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
This PR involves cherry-pick of vllm-project#15112 from the upstream and a fix for cos_sin preparation in emb layers to match regional compilation. --------- Signed-off-by: Jan Kaniecki <[email protected]>
This PR involves cherry-pick of vllm-project#15112 from the upstream and a fix for cos_sin preparation in emb layers to match regional compilation. --------- Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]>
Signed-off-by: Jan Kaniecki <[email protected]> Signed-off-by: Mu Huai <[email protected]>
When trying to perform regional compilation with t.compile (compiling layers separately instead of calling t.compile on whole model) on mllama model with Gaudi devices, such an error occures:
ValueError: Unknown decoder layer type <class 'torch._dynamo.eval_frame.OptimizedModule'>
Regional compilation for Gaudi devices has been added with #13213
Cause for this issue is checking layer classes by their names inside mllama code e.g.:
if isinstance(decoder_layer, MllamaCrossAttentionDecoderLayer):
Torch.compile wraps module after compilation with torch._dynamo.eval_frame.OptimizedModule name, that's wht we see mismatch in isinstance function. To resolve that we can distinguish layers basing on self.cross_attention_layers ids - and so do proposed changes. We don't also need raise ValueError in layer instance checking as there is no option for decoder layers to be of different types than desired ones.