Skip to content

Commit 0df2510

Browse files
rainkertdangshunya
and
dangshunya
authored
[Bugfix] Fix gptq_marlin for deepseek-v3 (#13750)
Signed-off-by: dangshunya <[email protected]> Co-authored-by: dangshunya <[email protected]>
1 parent e123aaf commit 0df2510

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/model_executor/layers/quantization/gptq_marlin.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,9 @@ def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
569569
replace_parameter(layer, "w13_scales", marlin_w13_scales)
570570
marlin_w2_scales = marlin_moe_permute_scales(
571571
s=layer.w2_scales,
572-
size_k=layer.w2_scales.shape[1] * self.quant_config.pack_factor,
572+
size_k=layer.w2_scales.shape[1] *
573+
(self.quant_config.group_size if self.quant_config.group_size != -1
574+
else self.quant_config.pack_factor),
573575
size_n=layer.w2_scales.shape[2],
574576
group_size=self.quant_config.group_size,
575577
)

0 commit comments

Comments
 (0)