Skip to content

Commit a23fd78

Browse files
DarkLight1337lulmer
authored andcommitted
[Bugfix] Fix num video tokens calculation for Qwen2-VL (vllm-project#13148)
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
1 parent e69a3d6 commit a23fd78

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

vllm/model_executor/models/qwen2_vl.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -800,7 +800,11 @@ def _get_vision_info(
800800
preprocessed_size = ImageSize(width=image_width,
801801
height=image_height)
802802

803-
grid_t = max(num_frames // temporal_patch_size, 1)
803+
# NOTE: Frames are padded to be divisible by `temporal_patch_size`
804+
# https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L294
805+
padded_num_frames = num_frames + num_frames % temporal_patch_size
806+
807+
grid_t = max(padded_num_frames // temporal_patch_size, 1)
804808
grid_h = preprocessed_size.height // patch_size
805809
grid_w = preprocessed_size.width // patch_size
806810

0 commit comments

Comments
 (0)