bugfix: bugfix to #949 #951

yzh119 · 2025-03-17T06:36:39Z

The sm86/sm89 version of mla kernel was not tests after change #942, this PR fixes the issue.

This PR also make the following changes:

adding the mla unittest to CI (on a10g node).
shrinking the unittest of mla so that CI can finish in reasonable time.
change is_sm90a_supported(torch.device("cuda")) to backend == "fa3" and not is_sm90a_supported(torch.device("cuda")): for non-hopper GPUs, as pointed by @Atream .

yzh119 · 2025-03-17T08:07:55Z

ci has some issues similar to https://discuss.pytorch.org/t/torch-pytest-leads-to-memory-fragmentation-how-to-do-proper-integration-testing-of-a-lot-of-torch-models/201231

The oom hook (

flashinfer/tests/conftest.py

Lines 124 to 135 in 6d5320b

    
           @pytest.hookimpl(tryfirst=True) 
        
           def pytest_runtest_call(item): 
        
               # skip OOM error 
        
               try: 
        
                   item.runtest() 
        
               except (torch.OutOfMemoryError, RuntimeError) as e: 
        
                   if isinstance(e, torch.OutOfMemoryError) or "CUDA error: out of memory" in str( 
        
                       e 
        
                   ): 
        
                       pytest.skip("Skipping due to OOM") 
        
                   else: 
        
                       raise

) didn't totally avoid the issue.

yzh119 · 2025-03-17T16:23:36Z

Temporary solution is to set environment variable PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation.

@Atream

The sm86/sm89 version of mla kernel was not tests after change flashinfer-ai#942, this PR fixes the issue. This PR also make the following changes: 1. adding the mla unittest to CI (on a10g node). 2. shrinking the unittest of mla so that CI can finish in reasonable time. 3. change `is_sm90a_supported(torch.device("cuda"))` to `backend == "fa3" and not is_sm90a_supported(torch.device("cuda")):` for non-hopper GPUs, as pointed by @Atream .

yzh119 added 2 commits March 17, 2025 06:31

upd

39f9a60

upd

b44a956

yzh119 mentioned this pull request Mar 17, 2025

[Bug] MLA kernel fails the tests in tests/test_deepseek_mla.py #949

Closed

yzh119 added 4 commits March 17, 2025 06:50

bugfix

75034c5

fix

527c7a9

fix

99f9ead

remove debug code

4649833

yzh119 added 5 commits March 17, 2025 10:10

debug

d5edeb0

upd

13460a5

bugfix

2c0841d

upd

141a99f

upd

ba0ee1d

yzh119 merged commit 30b2838 into flashinfer-ai:main Mar 17, 2025
2 checks passed

yzh119 mentioned this pull request Mar 19, 2025

Flashinferv0.2.2.post1 shows Unsupported max_mma_kv: 0 error on L40 , when deploying Deepseek-V2-Lite-chat with --enable-flashinfer-mla #926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: bugfix to #949 #951

bugfix: bugfix to #949 #951

yzh119 commented Mar 17, 2025

yzh119 commented Mar 17, 2025

yzh119 commented Mar 17, 2025

bugfix: bugfix to #949 #951

bugfix: bugfix to #949 #951

Conversation

yzh119 commented Mar 17, 2025

yzh119 commented Mar 17, 2025

yzh119 commented Mar 17, 2025