-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[V1] Scatter and gather placeholders in the model runner #15712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks as always for the great work @DarkLight1337! I'm not an expert in most of what's changed here but did notice one thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and even removes a lot of complex code. It should fix the immediate issue we have with llava on TPU since we get to skip the logic now in the non-Pixtral case. Even if we are stuck with dynamic creation, we can isolate it in a smaller graph with this refactor.
Does this still have many issues to resolve or do you think it could be tractable this week?
This pull request has merge conflicts that must be resolved before it can be |
|
Signed-off-by: DarkLight1337 <[email protected]>
9fbd1c2
to
9e4e312
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@mgoin if you have time, can you help check the following models locally? I have verified a few models on my end but there are still many to go:
Run the example scripts:
|
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like increasing the audio fetch timeout indeed fixes the test, so I assume it's probably just a cold start issue?
Anyways LGTM
Signed-off-by: Roger Wang <[email protected]>
|
Signed-off-by: Roger Wang <[email protected]>
)" This reverts commit f5722a5.
|
…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: xinyuxiao <[email protected]>
…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]>
This PR is an attempt to move
scatter_patch_features
andgather_patch_feature
into the model runner (outside of the model) to avoid interfering with TPU graph compilation.Breaking change for model developers:
PromptUpdateDetails.features
has been replaced withPromptUpdateDetails.is_embed
. You can use the newly added factoriesPromptUpdateDetails.select_text
andPromptUpdateDetails.select_token_id
to generateis_embed
based on the target text/token ID.BaseProcessingInfo.get_num_image_tokens
should now return the equivalent ofPromptUpdateDetails.is_embed.sum()
instead of the number of tokens inPromptUpdateDetails.features
.