Skip to content

Commit 934cf70

Browse files
LucasWilkinsondbyoung18
authored andcommitted
[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (vllm-project#16209)
Signed-off-by: Lucas Wilkinson <[email protected]>
1 parent 03d2cb9 commit 934cf70

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/attention/backends/flash_attn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ def make_local_attention_virtual_batches(
264264
np.arange(pages_per_local_batch, dtype=np.int32),
265265
(virtual_batches, pages_per_local_batch)) \
266266
+ np.expand_dims(block_starts, axis=1)
267-
block_indices = block_indices.flatten()
267+
block_indices = block_indices.flatten().clip(max=block_table.shape[1] - 1)
268268
batch_indices = np.repeat(np.arange(actual_batch_size, dtype=np.int32),
269269
local_blocks * pages_per_local_batch)
270270
block_table_local = block_table[batch_indices, block_indices]\

0 commit comments

Comments
 (0)