Skip to content

feat: move AR fusion kernels from trtllm #1061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from

Conversation

yyihuang
Copy link
Collaborator

@yyihuang yyihuang commented May 15, 2025

Move All-reduce fusion kernels from trtllm to flashinfer

Requirements:

Changes:

  • Add trtllm AR-fusion kernels
  • Add python interface
  • Add quantization utils
  • (optional) Add trtllm checker/assertion - unused yet, but replaced by torch check. Please eval it in this code review.

To be discussed:

  • maintain trtllm-style assertion/check? (currently with torch check)
  • interface level at allreduce_fusion_kernel_XXXX or allreduce_fusion_op? (currently at allreduce_fusion_op)

next todo:

  • compile
  • minimize dependency
  • add flashinfer logger, exception, check (maybe torch_check)
  • design communication module unified interface
  • unit test on python interface
  • benchmark (optional?)
  • unit test on C++ interface (not in plan)

@yzh119
Copy link
Collaborator

yzh119 commented May 28, 2025

Closed for now because it introduce deep trtllm dependency and hard to maintain.
We will split the trtllm comm kernels into three pieces:

  1. one and two-shot allreduce kernels (w/ rmsnorm fusion), feat: add trtllm all-reduce (non-MoE) #1096
  2. low-precision allreduce kernels
  3. moe allreduce kernels

@yzh119 yzh119 closed this May 28, 2025
yzh119 added a commit that referenced this pull request Jun 2, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description

We add trt-llm custom all-reduce to flashinfer comm module.

## 🔍 Related Issues

We split this PR into multiple. 
#1061
MoE kernels are also in progress.

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->

---------

Co-authored-by: Zihao Ye <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants