-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Integrate quick allreduce and select the best allreduce implementation #18473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate quick allreduce and select the best allreduce implementation #18473
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
tp2
I'll add tp4, tp8 and e2e experimental data later, and cleaning up. |
5a6a5a8
to
e272658
Compare
what's the relationship between this and #16804 ? |
Aha, maybe we are competitive. I am from amd. We recently spent some time trying to integrate qr into vllm (because qr is very suitable for rocm) Integrating qr makes the two pr have many similarities, but it seems that the pr you mentioned #16804 only supports Q8 and Q 4. There are no obvious boundary conditions, quantization seems to have some problems, and lack of experimental data. Maybe we can work together to finish the work. |
08caa03
to
0989304
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
0989304
to
84b2ca1
Compare
Signed-off-by: Haoyang Li <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.