Skip to content

Commit b61092b

Browse files
authored
Raise warning for 24 compressed sparse-only models (#1107)
In a recent update, we disabled Cutlass kernels for sparse-only models vllm-project/vllm#12417. As a result, sparse-24-only compressed-models are no longer runnable in vLLM. This PR introduces a warning message to inform users when compression is enabled in scenarios where sparse-only models are unsupported. This ensures clarity and avoids unexpected behavior when using sparse-24 configurations with vLLM. Changes: - Added a warning to notify users when attempting to enable compression with sparse-only models in unsupported configurations. --------- Signed-off-by: Rahul Tuli <[email protected]>
1 parent 6fa5a5e commit b61092b

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

src/llmcompressor/transformers/compression/sparsity_config.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,11 @@ def is_sparse24_bitmask_supported(
181181
return False
182182

183183
if not is_model_quantized(model):
184-
# non-quantized 2:4 sparse models are supported
184+
logger.warning(
185+
"Compressed Sparse-only 2:4 models are not supported in vLLM<=0.7.0, "
186+
"consider saving with `disable_sparse_compression` set, "
187+
"`model.save_pretrained(..., disable_sparse_compression=True)`"
188+
)
185189
return True
186190

187191
# when model is quantized, and has 2:4 sparsity

0 commit comments

Comments
 (0)