Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: allow head_dim=64 for sm90 AOT #783

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

abcdabcd987
Copy link
Member

Keeps the default behavior of #782, i.e., build AOT without head_dim=64. But gives an option to specifically enable head_dim=64.

head_dim=64 is used by some small models like Qwen2.5-0.5B.

@abcdabcd987 abcdabcd987 requested a review from yzh119 February 4, 2025 19:53
@zhyncs
Copy link
Member

zhyncs commented Feb 4, 2025

JIT is ok, don’t enable it in AOT

@zhyncs
Copy link
Member

zhyncs commented Feb 4, 2025

The binary size is too large…

@zhyncs zhyncs closed this Feb 4, 2025
@abcdabcd987
Copy link
Member Author

This PR does not enable head_dim=64 by default.

@abcdabcd987 abcdabcd987 reopened this Feb 4, 2025
@yzh119
Copy link
Collaborator

yzh119 commented Feb 4, 2025

Hi @zhyncs this PR doesn't enable head_dim=64 by default, it's purpose is to allow head_dim=64: when user specify FLASHINFER_HEAD_DIMS=64,128,256, the kernels corresponding to head_dim=64 should be compiled. The default value is still FLASHINFER_HEAD_DIMS=128,256

@zhyncs
Copy link
Member

zhyncs commented Feb 4, 2025

I see. Thanks!

@zhyncs zhyncs merged commit 2d2e13a into flashinfer-ai:main Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants