-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Reshape cache flash kernel to support HND layout #8200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reshape cache flash kernel to support HND layout #8200
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
76b2706
to
7df8e91
Compare
👀 |
This pull request has merge conflicts that must be resolved before it can be |
86c54bc
to
36245ed
Compare
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
This pull request has merge conflicts that must be resolved before it can be |
44f6b58
to
c845f81
Compare
@WoosukKwon could you review the changes? Thanks! |
Signed-off-by: shuw <[email protected]>
c845f81
to
492d7d8
Compare
d3fbb2b
to
193e637
Compare
5db863e
to
19548ab
Compare
19548ab
to
7f538d7
Compare
Signed-off-by: shuw <[email protected]>
Signed-off-by: shuw <[email protected]>
Signed-off-by: shuw <[email protected]>
7f538d7
to
dd2613f
Compare
This pull request has merge conflicts that must be resolved before it can be |
NHD: [num_blocks, block_size, num_heads, head_size]
HND: [num_blocks, num_heads, block_size, head_size]
Many fast attention kernels only support HND layout for kv_cache and this PR make reshape_and_cache_flash kernel support both.