Add quickreduce as alternative to custom allreduce #16804

ilmarkov · 2025-04-17T21:57:33Z

Add quickreduce alternative to custom allreduce.
The collective is only enabled on AMD. The kernels support quantized all reduce collective int4/int8 symmetric and asymmetric quantization algorithms.
The kernels can be enabled by quick_reduce_allreduce_algo config parameter. On AMD we first check if it is profitable to use quickreduce, otherwise we fallback to custom allreduce.

github-actions · 2025-04-17T21:57:42Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

lixixicommute · 2025-04-23T13:17:16Z

hi, @ilmarkov ,
Thank you for your great work.
Does this program have any test data and how well does it work?
It looks like it's still a draft at the moment, will it be refined afterward?

mergify · 2025-05-14T03:38:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ilmarkov.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ilmarkov <[email protected]>

youkaichao · 2025-05-20T16:14:04Z

On AMD we first check if it is profitable to use quickreduce, otherwise we fallback to custom allreduce.

what are the cases when custom allreduce performs better than quickreduce? It would better if quickreduce can surpass custom allreduce in all cases, then we can use quickreduce as a drop-in replacement of custom allreduce without a new user-facing flag.

ilmarkov · 2025-05-20T19:48:15Z

@youkaichao It is slower for smaller input sizes. We could do the similar approach as custom allreduce has - use one shot for small buffers and two shot for larger ones.

youkaichao · 2025-05-21T02:00:08Z

@youkaichao It is slower for smaller input sizes. We could do the similar approach as custom allreduce has - use one shot for small buffers and two shot for larger ones.

that would be great, can you implement it? we can use either quickreduce or custom allreduce at the engine level, instead of dynamically switching based on the input size.

ilmarkov · 2025-05-21T08:42:12Z

Yes, we can try to implement this approach.
Although, custom allreduce setup and implementation is more suitable for low latency small input sizes, whereas quick reduce performs well for bandwidth bottlenecked workloads. At the moment, we use custom allreduce or nccl based on input size.
Also, we will still need the new uder-facing flag as we need to provide switch between quantization regimes allowing user to find a trade-off between accuracy and performance.

youkaichao · 2025-05-21T13:18:25Z

Also, we will still need the new uder-facing flag as we need to provide switch between quantization regimes allowing user to find a trade-off between accuracy and performance.

you can use an environment variable, like VLLM_ROCM_CA_BACKEND.

mergify · 2025-05-23T15:45:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ilmarkov.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the ci/build label Apr 17, 2025

ilmarkov force-pushed the experimental/quick_reduce branch from 5b81d85 to 96e1a3e Compare May 13, 2025 13:27

mergify bot added the needs-rebase label May 14, 2025

ilmarkov force-pushed the experimental/quick_reduce branch from 96e1a3e to d92ccc8 Compare May 19, 2025 09:42

mergify bot added frontend v1 and removed needs-rebase labels May 19, 2025

ilmarkov added 6 commits May 20, 2025 13:41

Add quickreduce as alternative to custom allreduce

6574819

Signed-off-by: ilmarkov <[email protected]>

WIP

b7b7135

Signed-off-by: ilmarkov <[email protected]>

Add bf16 support

93352f4

Signed-off-by: ilmarkov <[email protected]>

WIP

a77a972

Signed-off-by: ilmarkov <[email protected]>

Refactor QuickReduce

ddc3eac

Signed-off-by: ilmarkov <[email protected]>

Cleanup

6f17424

Signed-off-by: ilmarkov <[email protected]>

ilmarkov force-pushed the experimental/quick_reduce branch from d92ccc8 to 6f17424 Compare May 20, 2025 13:41

ilmarkov marked this pull request as ready for review May 20, 2025 13:42

ilmarkov requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, tlrmchlsmth, zhuohan123 and youkaichao as code owners May 20, 2025 13:42

youkaichao mentioned this pull request May 21, 2025

Integrate quick allreduce and select the best allreduce implementation #18473

Open

mergify bot added the needs-rebase label May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add quickreduce as alternative to custom allreduce #16804

Add quickreduce as alternative to custom allreduce #16804

ilmarkov commented Apr 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

lixixicommute commented Apr 23, 2025

Uh oh!

mergify bot commented May 14, 2025

Uh oh!

youkaichao commented May 20, 2025 •

edited

Loading

Uh oh!

ilmarkov commented May 20, 2025

Uh oh!

youkaichao commented May 21, 2025

Uh oh!

ilmarkov commented May 21, 2025

Uh oh!

youkaichao commented May 21, 2025

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add quickreduce as alternative to custom allreduce #16804

Are you sure you want to change the base?

Add quickreduce as alternative to custom allreduce #16804

Conversation

ilmarkov commented Apr 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

lixixicommute commented Apr 23, 2025

Uh oh!

mergify bot commented May 14, 2025

Uh oh!

youkaichao commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilmarkov commented May 20, 2025

Uh oh!

youkaichao commented May 21, 2025

Uh oh!

ilmarkov commented May 21, 2025

Uh oh!

youkaichao commented May 21, 2025

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

Uh oh!

ilmarkov commented Apr 17, 2025 •

edited by github-actions bot

Loading

youkaichao commented May 20, 2025 •

edited

Loading