[WIP]: DRY sampling #16695

0xymoro · 2025-04-16T05:09:05Z

New feature (work in progress, would like some more guidance): #8581

Implementation of logits processor for DRY sampling. The code is a port of an optimized version of DRY that I've been running for months now across trillions of tokens and is stable.

Some of the code was adapted from the original optimized DRY by Gemini 2.5 Pro. The logic should be the same but once we're able to hook it up and test it will see then.

Would love some help on routing the extra_args into here so the sampler can be exposed. Got lost in the code a bit there and it's not clear to me how it works currently with old & new V1.

github-actions · 2025-04-16T05:09:15Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

NickLucche

Thanks for the contribution!
So the main difference is that in V1 sampling params are managed "continuously" as in they're allocated once and then overwritten as requests come through to avoid re-allocation costs.

Because of this extra_data field it is unclear how to handle that optimally here. Do you have more context to share on discussions about DRY as to why this could/would not be integrated like penalties are?

NickLucche · 2025-04-16T15:23:04Z

vllm/v1/sample/ops/penalties.py

+    if not hasattr(sampling_metadata, 'extra_data') or sampling_metadata.extra_data is None:
+         # If no extra_data field exists or is None, cannot apply DRY
+         return logits
+
+    # Check if any request might have DRY enabled (basic check)
+    # More robust check would involve iterating through extra_data first
+    has_potential_dry = any(
+        data and data.get('dry_multiplier', _DRY_DEFAULT_MULTIPLIER) > 0
+        for data in sampling_metadata.extra_data
+    )
+    if not has_potential_dry:
+         return logits


this check should ideally be moved to a gpu_input_batch property like penalties.

There is a longer discussion within the #bounty-program long thread in Slack - but I think Nick was saying he wanted it to be part of extra args. I'm not quite familiar with the codebase itself, but DRY essentially works like other penalties and has a few params. It'd be great if it's a first class sampler where things like dry_multiplier, etc are just passed into the engine like repetition_penalty is. I'm fairly confused about the passing of extra args myself since some part of it reads to me like it's not integrated fully yet? But the Slack thread and latest messages were what I was going by.

@0xymoro apologies for the confusion. As @NickLucche says, the state that only depends on the current set of requests and their parameters, should go in InputBatch here (you can see the state associated with other sampling params there).

Then, add logic in the add_request method to update this state based on your own request parameters that you can retrieve from request.sampling_params.extra_args.

Logic should also be added to the remove_request(), swap_states() and condense() methods to remove/reorder the requests within the preallocated state, and to _make_sampling_metadata() to update the SamplingMetadata based on the current (changed) state in the input batch.

All of this is what #13360 aims to abstract/encapsulate into the LogitProcessor interface.

NickLucche · 2025-04-17T11:30:40Z

I think current implementation is quite inefficient, there's too many slow fors and data movement imo wrt the other sampling options that are supported.
I see how having this as a "pluggable" addition keeps it isolated, but the extra_args interface surely does not help in making it any faster.

Would you be interested in editing the benchmark_serving.py script similar to what was done here #16022 to provide some numbrs that we can use to drive this implementation forward?

njhill · 2025-04-17T18:57:45Z

Thanks @0xymoro. Is this equivalent to the "new/optimized implementation" that you had mentioned in the slack thread?

Like @NickLucche says, I think there should in any case be opportunities to vectorize at least some of the operations which should make a big difference when there's a number of requests using DRY at the same time. That's mostly orthogonal to the other changes mentioned above though, I think they could be tackled in either order.

0xymoro · 2025-04-17T19:30:39Z

From Slack discussion - will wait until #13360 is more finalized before building on top of that, so will come back to this PR or create a new one rebased on top of when that is merged in

mergify bot added the v1 label Apr 16, 2025

0xymoro force-pushed the main branch from f11a0e2 to 03ee5df Compare April 16, 2025 05:20

feat: apply dry logic

03089d2

0xymoro force-pushed the main branch from 03ee5df to 03089d2 Compare April 16, 2025 05:24

NickLucche suggested changes Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP]: DRY sampling #16695

[WIP]: DRY sampling #16695

0xymoro commented Apr 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

NickLucche left a comment

Uh oh!

NickLucche Apr 16, 2025

Uh oh!

0xymoro Apr 17, 2025 •

edited

Loading

Uh oh!

njhill Apr 17, 2025

Uh oh!

NickLucche commented Apr 17, 2025

Uh oh!

njhill commented Apr 17, 2025

Uh oh!

0xymoro commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

[WIP]: DRY sampling #16695

Are you sure you want to change the base?

[WIP]: DRY sampling #16695

Conversation

0xymoro commented Apr 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

0xymoro Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Apr 17, 2025

Uh oh!

njhill commented Apr 17, 2025

Uh oh!

0xymoro commented Apr 17, 2025

Uh oh!

Uh oh!

0xymoro commented Apr 16, 2025 •

edited by github-actions bot

Loading

0xymoro Apr 17, 2025 •

edited

Loading