-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] MLA kernel fails the tests in tests/test_deepseek_mla.py #949
Comments
Thanks for reporting the issue, I'll fix it soon and add mla unittests to CI. |
Hi @Atream , I can't reproduce the issue, can you show me the exact test cast (the batch_size/kv_len/qo_len/etc in test_batch_mla_page_attention) that generates the wrong outputs:
4090 do not support |
I run this:
|
Should have been fixed in #951 , you can check the unittest status at https://ci.tlcpack.ai/blue/organizations/jenkins/flashinfer-ci/detail/PR-951/2/pipeline (GPU-G5-Test-4). |
I tested on my env.
|
Hi @Atream that's desirable because the original atol and rtol are designed for fp16. bf16 have larger errors inherently (as studied in https://arxiv.org/abs/2405.02803), usually we can tolerate 2e-2 difference for bf16 unittests. For bf16 unittests, we need to increase the atol and rtol correspondingly. The end-to-end evaluation after #951 should be normal. |
It works fine. Thank you for your quick fix. |
The sm86/sm89 version of mla kernel was not tests after change #942, this PR fixes the issue. This PR also make the following changes: 1. adding the mla unittest to CI (on a10g node). 2. shrinking the unittest of mla so that CI can finish in reasonable time. 3. change `is_sm90a_supported(torch.device("cuda"))` to `backend == "fa3" and not is_sm90a_supported(torch.device("cuda")):` for non-hopper GPUs, as pointed by @Atream .
The sm86/sm89 version of mla kernel was not tests after change flashinfer-ai#942, this PR fixes the issue. This PR also make the following changes: 1. adding the mla unittest to CI (on a10g node). 2. shrinking the unittest of mla so that CI can finish in reasonable time. 3. change `is_sm90a_supported(torch.device("cuda"))` to `backend == "fa3" and not is_sm90a_supported(torch.device("cuda")):` for non-hopper GPUs, as pointed by @Atream .
The MLA kernel fails the tests in tests/test_deepseek_mla.py. I used the current main branch with commit 27906fd, but it cannot pass the unit tests in tests/test_deepseek_mla.py. The output in the integrated system is also abnormal. After reverting to the previous commit 061db55, everything works fine.
Environment
RTX 4090, CUDA 12.4, torch 2.5.1
Fail in test_batch_mla_varlen_page_attention, test_batch_mla_varlen_page_attention, test_batch_mla_page_attention on BFloat16.
To test on 4090, I remove
if not is_sm90a_supported(torch.device("cuda"))
check.The text was updated successfully, but these errors were encountered: