Skip to content

Commit 62b6978

Browse files
York-RDWanglulmer
authored andcommitted
[Doc] Update prefix_caching.md to match the example image (vllm-project#14420)
Signed-off-by: Louis Ulmer <[email protected]>
1 parent 9e980b0 commit 62b6978

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/design/v1/prefix_caching.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ In this example, we assume the block size is 4 (each block can cache 4 tokens),
221221
:alt: Example Time 6
222222
:::
223223

224-
**Time 7: Request 2 comes in with the 33 prompt tokens, where the first 16 tokens are the same as request 0\.** Note that even the block order in the free queue was `7 - 8 - 9 - 4 - 3 - 2 - 6 - 5 - 1 - 0`, the cache hit blocks (i.e., 0, 1, 2) are touched and removed from the queue before allocation, so the free queue becomes `7 - 8 - 9 - 4 - 3 - 6 - 5`. As a result, the allocated blocks are 0 (cached), 1 (cached), 2 (cached), 7, 8, 9, 4, 3 (evicted).
224+
**Time 7: Request 2 comes in with the 29 prompt tokens, where the first 12 tokens are the same as request 0\.** Note that even the block order in the free queue was `7 - 8 - 9 - 4 - 3 - 2 - 6 - 5 - 1 - 0`, the cache hit blocks (i.e., 0, 1, 2) are touched and removed from the queue before allocation, so the free queue becomes `7 - 8 - 9 - 4 - 3 - 6 - 5`. As a result, the allocated blocks are 0 (cached), 1 (cached), 2 (cached), 7, 8, 9, 4, 3 (evicted).
225225

226226
:::{image} /assets/design/v1/prefix_caching/example-time-7.png
227227
:alt: Example Time 7

0 commit comments

Comments
 (0)