Skip to content

Commit f8a0f11

Browse files
madamczyk-intelIsotr0py
authored andcommitted
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu (vllm-project#12152)
Signed-off-by: Michal Adamczyk <[email protected]> Signed-off-by: Isotr0py <[email protected]>
1 parent cac6b7d commit f8a0f11

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

vllm/model_executor/guided_decoding/xgrammar_decoding.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -298,16 +298,19 @@ def __call__(self, input_ids: list[int],
298298
# token_bitmask is a CPU tensor for use with accept_token and
299299
# fill_next_token_bitmask so we move it to the device of scores
300300
device_type = scores.device.type
301+
dtype = scores.dtype
301302
if device_type != "cuda":
302-
scores = scores.to("cpu").unsqueeze(0)
303+
# xgrammar on cpu only supports float32 scores
304+
# see: https://github.com/mlc-ai/xgrammar/blob/c1b64920cad24f44f235778c1c00bb52d57da01a/python/xgrammar/kernels/apply_token_bitmask_inplace_cpu.py#L22
305+
scores = scores.to("cpu").float().unsqueeze(0)
303306

304307
# Note: In this method, if the tensors have different dimensions
305308
# on CPU device fails, but on GPU it runs without error. Hence the
306309
# unsqueeze above for scores, to match the token bitmask shape
307310
xgr.apply_token_bitmask_inplace(scores,
308311
self.token_bitmask.to(scores.device))
309312
if device_type != "cuda":
310-
scores = scores.to(device_type).squeeze()
313+
scores = scores.to(dtype).to(device_type).squeeze()
311314

312315
return scores
313316

0 commit comments

Comments
 (0)