Skip to content

Commit 9907bc1

Browse files
authored
bugfix: Fix cudagraph mode of BatchPrefillWithRaggedKVCacheWrapper (#412)
The computation of `fixed_batch_size` is not correct.
1 parent 58d3593 commit 9907bc1

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

python/flashinfer/prefill.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1215,8 +1215,8 @@ def __init__(
12151215
raise ValueError(
12161216
"kv_indptr_buf should be a torch.Tensor in cuda graph mode"
12171217
)
1218-
self._fixed_batch_size = len(qo_indptr_buf)
1219-
if len(kv_indptr_buf) != self._fixed_batch_size:
1218+
self._fixed_batch_size = len(qo_indptr_buf) - 1
1219+
if len(kv_indptr_buf) != self._fixed_batch_size + 1:
12201220
raise ValueError(
12211221
"The length of kv_indptr_buf ({}) should be the same as qo_indptr_buf ({}).".format(
12221222
len(kv_indptr_buf), self._fixed_batch_size

0 commit comments

Comments
 (0)