Skip to content

Commit 5808d74

Browse files
authored
Fix issue with padding changing during decode. (#64)
Fix issue with batch padding changing during decoding (e.g., if one sequence finished before the others).
2 parents 7e068b5 + 20f99a8 commit 5808d74

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

vllm/worker/spyre_model_runner.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,10 @@ def _prepare_decode(
156156
# padding to compiled batch size
157157
actual_batch_size = len(seq_group_metadata_list)
158158
padded_batch_size = self._position_ids.shape[0]
159+
160+
# set number of added padding sequences used for computing logits
161+
self.model.num_padded_sequences = padded_batch_size - actual_batch_size
162+
159163
while actual_batch_size < padded_batch_size:
160164
input_tokens.append([0])
161165
actual_batch_size += 1

0 commit comments

Comments
 (0)