misc: allow head_dim=64 for sm90 AOT #783

abcdabcd987 · 2025-02-04T19:53:43Z

Keeps the default behavior of #782, i.e., build AOT without head_dim=64. But gives an option to specifically enable head_dim=64.

head_dim=64 is used by some small models like Qwen2.5-0.5B.

zhyncs · 2025-02-04T19:57:04Z

JIT is ok, don’t enable it in AOT

zhyncs · 2025-02-04T19:58:08Z

The binary size is too large…

abcdabcd987 · 2025-02-04T19:59:14Z

This PR does not enable head_dim=64 by default.

yzh119 · 2025-02-04T20:02:49Z

Hi @zhyncs this PR doesn't enable head_dim=64 by default, it's purpose is to allow head_dim=64: when user specify FLASHINFER_HEAD_DIMS=64,128,256, the kernels corresponding to head_dim=64 should be compiled. The default value is still FLASHINFER_HEAD_DIMS=128,256

zhyncs · 2025-02-04T20:03:45Z

I see. Thanks!

misc: allow head_dim=64 for sm90 AOT

a0e4dd7

abcdabcd987 requested a review from yzh119 February 4, 2025 19:53

zhyncs closed this Feb 4, 2025

abcdabcd987 reopened this Feb 4, 2025

zhyncs merged commit 2d2e13a into flashinfer-ai:main Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misc: allow head_dim=64 for sm90 AOT #783

misc: allow head_dim=64 for sm90 AOT #783

abcdabcd987 commented Feb 4, 2025

zhyncs commented Feb 4, 2025

zhyncs commented Feb 4, 2025

abcdabcd987 commented Feb 4, 2025

yzh119 commented Feb 4, 2025

zhyncs commented Feb 4, 2025

misc: allow head_dim=64 for sm90 AOT #783

misc: allow head_dim=64 for sm90 AOT #783

Conversation

abcdabcd987 commented Feb 4, 2025

zhyncs commented Feb 4, 2025

zhyncs commented Feb 4, 2025

abcdabcd987 commented Feb 4, 2025

yzh119 commented Feb 4, 2025

zhyncs commented Feb 4, 2025