-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 #13305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
vllm-bot
merged 73 commits into
vllm-project:main
from
ROCm:upstream_prefix_prefill_speed_up
Apr 23, 2025
Merged
Changes from 55 commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
b6b00d7
init
SageMoore fa52268
temporarily remove torch from requirements-build
SageMoore f563276
move rocm logic to its own attention backend
SageMoore 2a03b92
actually add backend
SageMoore 4bdf7de
more rocm refactoring
SageMoore 875fcfc
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore e507e30
more rocm refactoring
SageMoore b9ce259
hack to fix the multiprocessing isssue
SageMoore f2cc5e3
minor print fix
SageMoore d6f6c5c
remove cruft
SageMoore 2bf214a
format
SageMoore 11411cb
modify requirements files
SageMoore c2499bf
remove basic.py changes
SageMoore cf6f691
cleanup
SageMoore 4505f53
add support for passing in softmax scales to the context_attn_fwd
SageMoore 9a0416a
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore ef9ae86
added requirements-rocm-build
SageMoore 0ccef65
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore a00a2d9
minor setup.py fix
SageMoore afb15f5
add batch size back in
SageMoore 08a25b7
revert setup.py change
SageMoore 55eb036
update setup.py
SageMoore 95df571
init
SageMoore 0bfe435
init
SageMoore 4b62de2
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore d2f3c85
minor fix
SageMoore 442bc7b
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 9472636
minor fix
SageMoore c7497f3
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 21d8d6a
update error messages
SageMoore a1cac3d
init
SageMoore 40a64d7
Merge branch 'neuralmagic_sage_amd-v1' into upstream_prefix_prefill_s…
329ad79
Merge branch 'neuralmagic_sage_prefix-prefill-refactor' into upstream…
ad1db61
Merge branch 'neuralmagic_sage_rocm-fp4-fix' into upstream_prefix_pre…
c02b1e6
new prefix_prefill
540b286
dwordx4 for k and v from cache
4784522
merge with main
1d9eb50
follow up merge with main
9eb5566
different stages for different loops
0afb796
merge with main
9bc9217
unroll factors tunning
2b84448
linter
1067508
default prefix_prefill for triton lower than 3.2, NV case
fb21239
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
qli88 3b99cf7
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
506b0c4
original softmax restored to get back accuracy
1dc5142
merge with main
e044108
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
05c3d3b
adaptation to ibm kernel
e76f27f
softmax computation correction
da80a03
a comment for triton version
83a86a8
Merge branch 'upstream_prefix_prefill_speed_up' of github.com:ROCm/vl…
1369809
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
81277c8
kpack is not supported on NVidia triton
a4000df
kpack is not supported on NVidia triton
a027e5c
reduced space of autotuning
db608bb
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
81c2739
giving up on autotune and selecting one config
7add0e2
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
5a17950
fixing test with only to ROCM waves per eu and max_seq_len None
5d9a929
renaming kernel
27f044b
clean up and fix for failed kernel tests
cfd60c9
clean up and fix for failed kernel tests
0a26697
clean up and fix for failed kernel tests
35a6e49
got rid of autotuner and get stable runs right from the first iteration
6d5b3f2
restoring paged attn as there is no autotuning anymore and that will …
7140d1a
poking test rerun as one failed and seems not because of this change
169f714
Merge branch 'main' of github.com:vllm-project/vllm into upstream_pre…
f437b11
Merge branch 'upstream/main' into upstream_prefix_prefill_speed_up
ba078b6
comment correction
617ef08
dot operation in triton doesn't support k to be 8 so increasing block…
771ad9e
to kick CIs again Async Engine, Inputs, Utils, Worker Test seems flaky
b6bf365
to kick CIs again
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.