Skip to content

Commit 9ecb93d

Browse files
gau-nernstnishith-fujitsu
authored andcommitted
[Bugfix] Fix cache block size calculation for CPU MLA (vllm-project#15848)
Signed-off-by: Thien Tran <[email protected]>
1 parent a2f900b commit 9ecb93d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/worker/cpu_worker.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def get_cache_block_size(
106106
num_layers = model_config.get_num_layers(parallel_config)
107107

108108
key_cache_block = block_size * num_heads * head_size
109-
value_cache_block = key_cache_block
109+
value_cache_block = key_cache_block if not model_config.use_mla else 0
110110
total = num_layers * (key_cache_block + value_cache_block)
111111
if cache_dtype == "auto":
112112
dtype = model_config.dtype

0 commit comments

Comments
 (0)