-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Core] Use flashinfer sampling kernel when available #7137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
7e47c2d
Use flashinfer kernel to do sampling if available
peng1999 fa61264
Merge remote-tracking branch 'upstream/main' into opt-topk
peng1999 56beab0
Fix type mismatch
peng1999 5396c9d
Some renaming
peng1999 5999bd3
Fallback for flashinfer sampler
peng1999 420b004
Formatting fix
peng1999 98d372e
Tests fix
peng1999 0a8be18
Fix mypy
peng1999 f170646
Add test for flashinfer sampler
peng1999 88c8a98
Suppress yapf on import
peng1999 c404cd5
Fix pipeline
peng1999 c361a95
Change back to torch generator, add env flags
peng1999 8af0e09
Merge remote-tracking branch 'upstream/main' into opt-topk
peng1999 99f7ecc
rename env for flashinfer, rollback changes in utils
peng1999 7e03711
rollback changes to utils
peng1999 6416046
rename env
peng1999 fdc23a3
add top_k_top_p when fallback
peng1999 b97c911
Adapt flashinfer 0.1.4
peng1999 f8d7093
Revert changes to sampling_metadata
peng1999 2d7e5c3
Change flashinfer 0.1.2 to 0.1.4 in test
peng1999 20eee6a
Merge remote-tracking branch 'upstream/main' into opt-topk
peng1999 f893110
Disable flashinfer in GPTQ reproduce test
peng1999 e4cfcfc
Disable flashinfer sampler in distributed test
peng1999 c5194ec
Merge remote-tracking branch 'upstream/main' into opt-topk
peng1999 0ec8b61
Disable flashinfer sampler by default
peng1999 9eaea5c
Update vllm/envs.py
peng1999 18d59a1
Merge branch 'vllm-project:main' into opt-topk
peng1999 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.