-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Sampler] Adapt to FlashInfer 0.2.3 sampler API #15777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Bowen Wang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Hi @youkaichao , are failed test related to this PR? I saw some error message like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to update the dockerfile to install the new flashinfer wheel?
Let me have a try |
Signed-off-by: Bowen Wang <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Can we revive this? I would like to update flashinfer to latest now that we have it integrated with V1 as an attention backend |
Sure, I'll add some tests soon |
FYI - synced with @mgoin @luccafong @houseroad offline, there was some numerical issue with flashinfer 0.2.5. Fortunately, we found that the issue was fixed already by the upstream (flashinfer-ai/flashinfer#1029) before the fix:
after the fix:
|
Hi @abmfy do you plan to have test updates soon? We can help make them if you don't have time right now |
Hi, sorry for the delayed response. I’ve just settled in the U.S. recently and was dealing with limited bandwidth. I’ll focus on adding the test today. |
This pull request has merge conflicts that must be resolved before it can be |
9a5040b
to
c159f4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks a lot for the PR!
docker/Dockerfile
Outdated
# TESTING: install FlashInfer from source to test 2.7.0 final RC | ||
FLASHINFER_ENABLE_AOT=1 TORCH_CUDA_ARCH_LIST='7.5 8.0 8.6 8.9 9.0+PTX' \ | ||
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/[email protected]" ; \ | ||
uv pip install --system https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.4/flashinfer_python-0.2.4+cu124torch2.6-cp38-abi3-linux_x86_64.whl ; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use this wheel since it is built for cu124torch2.6
. We need cu128 and torch 2.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. However, FlashInfer 0.2.4 only provides wheels up to cu124torch2.6
, so we may need to build from source in CI for now—at least until a new release of FlashInfer becomes available.
Signed-off-by: Bowen Wang <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Bowen Wang <[email protected]>
Some sampler tests of v0 seem to be failing due to the removal of the ability to pass pre-generated uniform samples to FlashInfer kernels. v1 sampler tests all passed. Will get back to this later. |
@abmfy Thank you can you fix this PR? This is part of the release blocker now. |
Signed-off-by: Bowen Wang <[email protected]>
Since FlashInfer 0.2.3 removed the ability to pass in uniform samples. Signed-off-by: Bowen Wang <[email protected]>
Sure, I'll fix the tests that seems to be related to this PR |
Signed-off-by: Bowen Wang <[email protected]>
Signed-off-by: Bowen Wang <[email protected]>
Signed-off-by: Bowen Wang <[email protected]>
Signed-off-by: Bowen Wang <[email protected]>
…#15777)" This reverts commit 7fdfa01. Signed-off-by: Mark McLoughlin <[email protected]>
Thanks for the contribution, however your PR appears to have broken many tests (see #18462), can those involved fix them to unblock the release? |
That's strange, will look into it |
Signed-off-by: Bowen Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Chenheli Hua <[email protected]>
Signed-off-by: Bowen Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
FlashInfer 0.2.3 introduced some breaking changes to its sampler API, this PR updates the calling sites in vLLM to adapt to the update.
FIX #14815
FIX #15666