[V1][Spec Decode] Optimize N-gram matching with Numba #13365

WoosukKwon · 2025-02-17T00:00:28Z

This PR optimizes the N-gram matching algorithm by JIT compiling it with Numba.
I've observed 20-30x speedup with large batch sizes: For ShareGPT benchmark with 5K requests, the cumulative overhead reduces from 54.3 sec to 1.9 sec, which is ~2.5% of the entire running time.

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2025-02-17T00:00:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon · 2025-02-17T23:49:42Z

cc @LiuXiaoxuanPKU This PR is ready. Could you please take a look?

Signed-off-by: Woosuk Kwon <[email protected]>

LiuXiaoxuanPKU

LGTM, thanks!

mgoin · 2025-02-18T22:26:17Z

requirements-common.txt

@@ -1,6 +1,7 @@
 psutil
 sentencepiece  # Required for LLaMA tokenizer.
 numpy < 2.0.0
+numba == 0.60.0 # v0.61 doesn't support Python 3.9. Required for N-gram speculative decoding.


Shouldn't this be in requirements-cuda.txt rather than common?

Oh I'm ok with either; I just thought it would be eventually used by others as well. Please feel free to submit a PR to move it to requirements-cuda.txt and probably requirements-rocm.txt.

michaelfeil · 2025-02-19T18:58:58Z

Very excited about this!

WoosukKwon · 2025-02-19T19:05:57Z

@michaelfeil Thanks! Happy to see you again :)
We still have some headroom for performance: #13498 Please let us know if you are interested in working on this.

…3365) Signed-off-by: Woosuk Kwon <[email protected]>

…3365) Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…3365) Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added 9 commits February 15, 2025 12:54

[V1] Get input tokens from scheduler

8406f11

Signed-off-by: Woosuk Kwon <[email protected]>

fix

0399f09

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into v1-scheduler-input

960964a

fix

c54ff6c

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into v1-scheduler-input

aa8ae69

comment

c833429

Signed-off-by: Woosuk Kwon <[email protected]>

[V1][Spec decode] Move drafter to model runner

b42a16f

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into v1-spec-decode

5f13604

[V1][Spec Decode] Optimize N-gram matching with Numba

490df6d

Signed-off-by: Woosuk Kwon <[email protected]>

mergify bot added ci/build v1 labels Feb 17, 2025

WoosukKwon added 4 commits February 17, 2025 11:18

Merge branch 'main' into v1-spec-decode

58e0856

Merge branch 'v1-spec-decode' into v1-spec-opt

85afbe6

Merge branch 'main' into v1-spec-opt

81456ab

update

c632ad4

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon marked this pull request as ready for review February 17, 2025 23:49

WoosukKwon requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners February 17, 2025 23:49

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2025

WoosukKwon added 4 commits February 17, 2025 15:54

minor

524af01

Signed-off-by: Woosuk Kwon <[email protected]>

Pin numba version

ca4458d

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into v1-spec-opt

11cceb4

Initialize drafter only for last rank

8de56ec

Signed-off-by: Woosuk Kwon <[email protected]>

LiuXiaoxuanPKU approved these changes Feb 18, 2025

View reviewed changes

WoosukKwon merged commit 4c82229 into main Feb 18, 2025
57 of 71 checks passed

WoosukKwon deleted the v1-spec-opt branch February 18, 2025 21:20

mgoin reviewed Feb 18, 2025

View reviewed changes

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025

[V1][Spec Decode] Optimize N-gram matching with Numba (vllm-project#1…

0c8d213

…3365) Signed-off-by: Woosuk Kwon <[email protected]>

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[V1][Spec Decode] Optimize N-gram matching with Numba (vllm-project#1…

1104f29

…3365) Signed-off-by: Woosuk Kwon <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1][Spec Decode] Optimize N-gram matching with Numba (vllm-project#1…

3b3b1db

…3365) Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1][Spec Decode] Optimize N-gram matching with Numba (vllm-project#1…

0497603

…3365) Signed-off-by: Woosuk Kwon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1][Spec Decode] Optimize N-gram matching with Numba #13365

[V1][Spec Decode] Optimize N-gram matching with Numba #13365

Uh oh!

WoosukKwon commented Feb 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

WoosukKwon commented Feb 17, 2025 •

edited

Loading

Uh oh!

LiuXiaoxuanPKU left a comment

Uh oh!

Uh oh!

mgoin Feb 18, 2025

Uh oh!

WoosukKwon Feb 18, 2025

Uh oh!

michaelfeil commented Feb 19, 2025

Uh oh!

WoosukKwon commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

[V1][Spec Decode] Optimize N-gram matching with Numba #13365

[V1][Spec Decode] Optimize N-gram matching with Numba #13365

Uh oh!

Conversation

WoosukKwon commented Feb 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

WoosukKwon commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgoin Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

michaelfeil commented Feb 19, 2025

Uh oh!

WoosukKwon commented Feb 19, 2025

Uh oh!

Uh oh!

WoosukKwon commented Feb 17, 2025 •

edited by github-actions bot

Loading

WoosukKwon commented Feb 17, 2025 •

edited

Loading