[V1] Ensure using int64 for sampled token ids #15065

WoosukKwon · 2025-03-18T23:07:53Z

A more fundamental solution to the bug in #14999 and #15049

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2025-03-18T23:08:02Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon · 2025-03-18T23:08:32Z

cc @houseroad

houseroad · 2025-03-18T23:14:25Z

vllm/v1/sample/sampler.py

+        # with subsequent operations that may use these values as indices.
+        # This conversion is necessary because FlashInfer sampling operations
+        # return int32 (while PyTorch argmax and topk return int64).
+        sampled = sampled.long()


actually I am debating to keep this in the gather_logprobs, since we may skip the conversion if gather_logprobs is not called. what do you think?

I think that's error prone. For example, other ops in the future might try to use this tensor for indexing and get the same error.

The op should be very cheap, it's supposed to be a no-op for common case (no top-p or top-k) where sampled is already a long tensor. Even if it's not, sampled tensor here is pretty small, so I don't think its overhead will matter.

Agreed with @WoosukKwon here, sampled here is pretty small even in the worst case (1024)

houseroad

Okay, if we see overhead, we can always optimize it :-)

Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

WoosukKwon added 2 commits March 18, 2025 16:03

[V1] Ensure using int64 for sampled token ids

b22c799

Signed-off-by: Woosuk Kwon <[email protected]>

minor

a4e0f05

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 18, 2025 23:07

mergify bot added the v1 label Mar 18, 2025

houseroad reviewed Mar 18, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

houseroad approved these changes Mar 19, 2025

View reviewed changes

WoosukKwon merged commit 05ccd0a into main Mar 19, 2025
41 of 43 checks passed

WoosukKwon deleted the ensure-int64 branch March 19, 2025 06:52

gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025

[V1] Ensure using int64 for sampled token ids (vllm-project#15065)

4e3b2e3

Signed-off-by: Woosuk Kwon <[email protected]>

b8zhong mentioned this pull request Apr 1, 2025

[V1] Fix: make sure k_index is int64 for apply_top_k_only #15907

Merged

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Ensure using int64 for sampled token ids (vllm-project#15065)

b5ae021

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[V1] Ensure using int64 for sampled token ids (vllm-project#15065)

6a9007f

Signed-off-by: Woosuk Kwon <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] Ensure using int64 for sampled token ids (vllm-project#15065)

4049723

Signed-off-by: Woosuk Kwon <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1] Ensure using int64 for sampled token ids (vllm-project#15065)

f71f6d3

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

houseroad mentioned this pull request May 31, 2025

[Core] Remove int32->int64->int32 overhead in FlashInfer sampling #18920

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] Ensure using int64 for sampled token ids #15065

[V1] Ensure using int64 for sampled token ids #15065

Uh oh!

WoosukKwon commented Mar 18, 2025

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

WoosukKwon commented Mar 18, 2025

Uh oh!

houseroad Mar 18, 2025

Uh oh!

WoosukKwon Mar 18, 2025 •

edited

Loading

Uh oh!

ywang96 Mar 19, 2025

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[V1] Ensure using int64 for sampled token ids #15065

[V1] Ensure using int64 for sampled token ids #15065

Uh oh!

Conversation

WoosukKwon commented Mar 18, 2025

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

WoosukKwon commented Mar 18, 2025

Uh oh!

houseroad Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WoosukKwon Mar 18, 2025 •

edited

Loading