Support SHA256 as hash function in prefix caching #15297

dr75 · 2025-03-21T16:50:37Z

Context

Prefix-caching may access blocks from unrelated requests due to hash collision. If there is a large number of blocks in the cache (e.g., 20k), a hash collision is rare but can happen. Also, as hash() is used, this could potentially be exploited.

Suggested change

To fix this, I want to replace hash() by hashlib.sha256(), increasing the hash size to 32byte.

Replacing hash was already discussed in #12621

Runtime micro benchmark

Benchmarking the provided implementation using sha256, the added overhead for 50k tokens context length is around 3ms on an Apple M2. W.r.t. a TTFT of 1.6s with prefix caching of 50k tokens on an H100, the added overhead is rather small.

Setup for measuring mean runtime for hashing large context:

50K token context
block size 16
3125 blocks total
Python 3.12
Avg. of 50 runs

Apple M2 (mean error < 40us):

Method	Time (ms)	Overhead (ms)
hash	0.67	0.00
hash+pickle	1.98	1.31
blake3+pickle	4.09	3.42
sha256+pickle	3.57	2.90
sha256+json	9.05	8.38
sha256+str	5.55	4.88
sha256+repr	5.56	4.89
sha256+marshal	4.83	4.16
(sha256 vs. hash)	( 3.57)	( 1.59)

Intel Xeon @ 2.20GHz (mean error < 90us)

Method	Time (ms)	Overhead (ms)
hash	1.54	0.00
hash+pickle	4.57	3.02
blake3+pickle	7.45	5.90
sha256+pickle	7.79	6.25
sha256+json	20.86	19.32
sha256+str	12.92	11.38
sha256+repr	12.94	11.40
sha256+marshal	10.94	9.40
(sha256 vs. hash)	( 7.79)	( 3.23)

sha256 vs. hash: for ref., comparing sha256+pickle vs. hash+pickle to see the sha256 vs. hash overhead

def sha256_pickle(input) -> int:
    input_bytes = pickle.dumps(input, protocol=pickle.HIGHEST_PROTOCOL)
    return int.from_bytes(hashlib.sha256(input_bytes).digest(), byteorder="big")

def sha256_json(input) -> int:
    input_str = json.dumps(input, sort_keys=True)
    input_bytes = input_str.encode("utf-8")
    return int.from_bytes(hashlib.sha256(input_bytes).digest(), byteorder="big")

def sha256_str(input) -> int:
    input_bytes = str(input).encode("utf-8")
    return int.from_bytes(hashlib.sha256(input_bytes).digest(), byteorder="big")

def sha256_repr(input) -> int:
    input_bytes = repr(input).encode("utf-8")
    return int.from_bytes(hashlib.sha256(input_bytes).digest(), byteorder="big")

def sha256_marshal(input) -> int:
    input_bytes = marshal.dumps(input)
    return int.from_bytes(hashlib.sha256(input_bytes).digest(), byteorder="big")

def blake3_pickle(input) -> int:
    input_bytes = pickle.dumps(input, protocol=pickle.HIGHEST_PROTOCOL)
    return int.from_bytes(blake3.blake3(input_bytes).digest(), byteorder="big")

# for performance comparison only
def hash_pickle(input) -> int:
    key = pickle.dumps(input, protocol=pickle.HIGHEST_PROTOCOL)
    return hash(key)

Full code to be added.

Please take a look @comaniac

github-actions · 2025-03-21T16:50:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/v1/core/kv_cache_utils.py

tests/v1/core/test_kv_cache_utils.py

vllm/v1/core/kv_cache_utils.py

mergify · 2025-03-24T15:34:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

comaniac

LGTM. Only nits.

vllm/v1/core/block_pool.py

vllm/engine/arg_utils.py

Signed-off-by: Marko Rosenmueller <[email protected]>

dr75 · 2025-03-26T08:55:19Z

CI fails in V1 Test for V1 LLM engine (v0.8.3.dev6+g74645add) with config: model='mistralai/Ministral-8B-Instruct-2410'


[2025-03-25T17:40:17Z] ERROR 03-25 10:40:17 [core.py:343] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 38.75 MiB is free. Process 19862 has 20.30 GiB memory in use. Process 20053 has 1.65 GiB memory in use. Of the allocated memory 1.43 GiB is allocated by PyTorch, and 13.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
--
  | [2025-03-25T17:40:17Z] ERROR 03-25 10:40:17 [core.py:343]
  | [2025-03-25T17:40:17Z] CRITICAL 03-25 10:40:17 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
  | [2025-03-25T17:40:17Z] bash: line 1:  4241 Killed                  pytest -v -s v1/entrypoints
  | [2025-03-25T17:40:18Z] 🚨 Error: The command exited with status 137

Seems unrelated.

Before there is also a test failure in v1/entrypoints/llm/test_struct_output_generate.py::test_guided_json_completion[Qwen/Qwen2.5-1.5B-Instruct-guidance]:

[2025-03-25T17:40:08Z] Prompt: "Give an example JSON for an employee profile that fits this schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string'}}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'requir
...
[2025-03-25T17:40:08Z] FAILED

The input contains a series of '\n' and there is no detailed error message w.r.t. the failure.

I am rebasing on main when adding review comments to see if that helps.

comaniac

LGTM

Signed-off-by: Marko Rosenmueller <[email protected]>

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Marko Rosenmueller <[email protected]>

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Mu Huai <[email protected]>

dr75 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners March 21, 2025 16:50

mergify bot added the v1 label Mar 21, 2025

comaniac reviewed Mar 21, 2025

View reviewed changes

comaniac self-assigned this Mar 21, 2025

mergify bot added the needs-rebase label Mar 24, 2025

dr75 force-pushed the block-hash branch from 17c3054 to fba4d8d Compare March 25, 2025 07:21

mergify bot added documentation Improvements or additions to documentation and removed needs-rebase labels Mar 25, 2025

dr75 changed the title ~~Use SHA256 instead of hash() in prefix caching~~ Support SHA256 as hash function in prefix caching Mar 25, 2025

comaniac approved these changes Mar 25, 2025

View reviewed changes

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 25, 2025

dr75 added 7 commits March 26, 2025 07:50

Use SHA256 instead of hash() in prefix caching

c6cbe5d

Signed-off-by: Marko Rosenmueller <[email protected]>

review comments

f5be13c

Signed-off-by: Marko Rosenmueller <[email protected]>

fix yapf/isort

79432fc

Signed-off-by: Marko Rosenmueller <[email protected]>

add support for hash function config

c89c904

Signed-off-by: Marko Rosenmueller <[email protected]>

fix: disallow None in arg

bca0c58

Signed-off-by: Marko Rosenmueller <[email protected]>

update docs

8d4ca74

Signed-off-by: Marko Rosenmueller <[email protected]>

review comments

653da92

Signed-off-by: Marko Rosenmueller <[email protected]>

dr75 force-pushed the block-hash branch from 74645ad to 653da92 Compare March 26, 2025 08:56

comaniac approved these changes Mar 26, 2025

View reviewed changes

comaniac merged commit 27df519 into vllm-project:main Mar 26, 2025
36 checks passed

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

972df7b

Signed-off-by: Marko Rosenmueller <[email protected]>

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Apr 2, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

795dc02

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

c4e8a1e

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

0e2d515

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

a08a907

Signed-off-by: Marko Rosenmueller <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

dr75 mentioned this pull request Apr 23, 2025

[Core] Prevent side-channel attacks via cache salting #17045

Merged

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

4c51c79

Signed-off-by: Marko Rosenmueller <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

b5ca71b

Signed-off-by: Marko Rosenmueller <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

Support SHA256 as hash function in prefix caching (vllm-project#15297)

4dbf445

Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support SHA256 as hash function in prefix caching #15297

Support SHA256 as hash function in prefix caching #15297

Uh oh!

dr75 commented Mar 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 24, 2025

Uh oh!

comaniac left a comment

Uh oh!

Uh oh!

Uh oh!

dr75 commented Mar 26, 2025 •

edited

Loading

Uh oh!

comaniac left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support SHA256 as hash function in prefix caching #15297

Support SHA256 as hash function in prefix caching #15297

Uh oh!

Conversation

dr75 commented Mar 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Suggested change

Runtime micro benchmark

Apple M2 (mean error < 40us):

Intel Xeon @ 2.20GHz (mean error < 90us)

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 24, 2025

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dr75 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dr75 commented Mar 21, 2025 •

edited by github-actions bot

Loading

dr75 commented Mar 26, 2025 •

edited

Loading