Skip to content

Commit f3a21e9

Browse files
authored
CacheConfig.block_size should always be int when used (#17052)
Signed-off-by: Harry Mellor <[email protected]>
1 parent 8e630d6 commit f3a21e9

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

vllm/config.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1261,11 +1261,14 @@ def is_matryoshka(self) -> bool:
12611261
class CacheConfig:
12621262
"""Configuration for the KV cache."""
12631263

1264-
block_size: Optional[BlockSize] = None
1264+
block_size: BlockSize = None # type: ignore
12651265
"""Size of a contiguous cache block in number of tokens. This is ignored on
12661266
neuron devices and set to `--max-model-len`. On CUDA devices, only block
12671267
sizes up to 32 are supported. On HPU devices, block size defaults to 128.
1268-
"""
1268+
1269+
This config has no static default. If left unspecified by the user, it will
1270+
be set in `Platform.check_and_update_configs()` based on the current
1271+
platform."""
12691272
gpu_memory_utilization: float = 0.9
12701273
"""The fraction of GPU memory to be used for the model executor, which can
12711274
range from 0 to 1. For example, a value of 0.5 would imply 50% GPU memory

0 commit comments

Comments
 (0)