[V1] vLLM OpenAI API custom args #16862

afeldman-nm · 2025-04-18T17:31:43Z

Add an extra_args: Optional[dict[str, Any]] field to CompletionRequest and ChatCompletionRequest (these are the only OpenAIBaseModel subclasses which had a logits_processors field in v0.) This field is injected into SamplingParams.extra_args via SamplingParams.from_optional(); each dict key/value pair in extra_args becomes an assignment to an attribute of sampling_params.

RFC: #17191

Fixes #16802

Signed-off-by: Andrew Feldman <[email protected]>

github-actions · 2025-04-18T17:31:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill · 2025-04-18T19:04:19Z

Thanks @afeldman-nm! It would be good to include a test that shows how these can be passed via the OpenAI client sdk using its extra_body option: https://github.com/openai/openai-python?tab=readme-ov-file#undocumented-request-params

I'm unsure whether we want these new custom args to be in a nested json object (as you've done here) or just extra top-level args.

afeldman-nm · 2025-04-18T19:48:50Z

Thanks @njhill . Agree regarding the unit test. I need to think a bit about the right way to do it

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-04-23T15:05:57Z

@njhill @comaniac @WoosukKwon

CC: @robertgshaw2-redhat

comaniac · 2025-04-23T15:26:31Z

tests/v1/entrypoints/openai/test_completion.py

+        # Contradictory `max_tokens`
+        extra_body={
+            "max_tokens": 5,


I don't think we should allow conflict fields in extra body as it is very confusing. The "extra_xxx" should literally extra fields that are not defined in other places.

PSA after chatting with Cody: extra_body is entirely a "client-side" feature, i.e. the Python SDK takes an extra_body argument, extracts the key/value pairs, and injects them into the JSON request as if they were top-level arguments. This means that the server never sees extra_body and in fact does not have the ability to handle it (If you use the HTTP client, any arguments in extra_body get ignored.) Another implication of this, is that we have no control over the behavior of extra_body because it is part of OpenAI's client SDK. And as you can see here:

https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/resources/models.py#L48

"The extra values given here {in extra_body} take precedence over values defined on the client or passed to this method."

So the support for having conflicting settings in extra_body is not a new feature added by my PR, it is an immutable (to us) characteristic of the SDK. I am simply utilizing this feature as a testing hack to confirm that extra_body is working.

comaniac · 2025-04-23T15:29:39Z

tests/v1/entrypoints/openai/test_completion.py

+            "ignore_eos": True,
+            "extra_sampling_params": {
+                # Contradictory max_tokens
+                "max_tokens": 5
+            }


Intuitively ingore_eos is a part of extra_sampling_params so we may consider moving it in? But this may affect existing users we should be careful. For example we should allow it in both places and show deprecated warning for 1-2 releases. Open to discuss cc @WoosukKwon @njhill

I'm going to write a short RFC for custom args because I think these details need to be ironed out. In the near-term I think it is fine to allow "special" fields to be set in two ways: as top-level API arguments (current behavior) or via extra_sampling_params (new behavior in this PR.)

In the long term, having two ways to set a special arg such as ignore_eos is confusing & we should probably restrict special args to only be set via extra_sampling_params. However, it is likely that customers depend on the current behavior & their code would be broken by this change. So we should probably wait until a major release (in whatever appropriate sense of the word "major") before restricting special args to be solely configurable via extra_sampling_params.

comaniac · 2025-04-23T15:30:21Z

vllm/entrypoints/openai/protocol.py

+    # Custom sampling params
+    extra_sampling_params: Optional[dict[str, Any]] = Field(
+        default=None,
+        description=("Additional kwargs to pass to sampling."),


Better to have more details, such as where to see the doc about extra params

yes I will clarify. I should probably also add some docs changes (perhaps as a separate PR)

comaniac · 2025-04-23T15:32:42Z

vllm/sampling_params.py

@@ -242,7 +242,6 @@ class SamplingParams(
    guided_decoding: Optional[GuidedDecodingParams] = None
    logit_bias: Optional[dict[int, float]] = None
    allowed_token_ids: Optional[list[int]] = None
-    extra_args: Optional[dict[str, Any]] = None


Why we need to change this? I suppose we only need to change the frontend to take extra args from OpenAI protocol, but the underlying implementation could remain the same?

This change (and some of the surrounding changes) cause vLLM to automatically unpack all of the key/value pairs in extra_args & try to assign them to members of sampling_params using setattr. This happens inside of from_optional(). Because extra_args gets unpacked into the member variables, there is no need to hold onto extra_args as a member variable, right?

I'll probably modify this behavior, so that only specific "special" arguments that are not part of the openai api (such as ignore_eos) can be set via extra_args.

But regardless - why do we need to hold onto extra_args as a member variable in SamplingParams?

afeldman-nm · 2025-04-23T16:22:36Z

Thanks for your review @comaniac . After chatting with Cody, I think this interface change is sufficiently impactful to merit an RFC which I will write and share shortly.

Signed-off-by: Andrew Feldman <[email protected]>

mergify · 2025-05-12T16:48:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Feldman <[email protected]>

helloworld1 · 2025-06-03T16:37:02Z

@afeldman-nm Are there any update on this PR? Have we reached an agreement in the RFC? Thanks!

afeldman-nm · 2025-06-03T18:04:43Z

@afeldman-nm Are there any update on this PR? Have we reached an agreement in the RFC? Thanks!

Hello @helloworld1 , yes the final proposal can be found here: #17191 (comment)

Although the RFC is finalized, this PR is has stalled because it is a bit tricky to unit test custom args when the feature it was designed to support (custom logits processors #17799 based on new logits processor implementation #16728 ) is itself still in development. However the custom args workstream is still WIP and is not abandoned.

Signed-off-by: Andrew Feldman <[email protected]>

helloworld1 · 2025-06-05T18:43:34Z

@afeldman-nm Glad to see it is still in progress. My use case is passing in truncate_prompt_tokens sampling parameters. Can we just unit test what we can test now and add more comprehensive unit tests when the logits process work is done?

extra_args

55328d8

Signed-off-by: Andrew Feldman <[email protected]>

mergify bot added the frontend label Apr 18, 2025

afeldman-nm added 2 commits April 21, 2025 17:45

Merge branch 'main' into extra_args

cc44096

Merge branch 'main' into extra_args

876de25

afeldman-nm mentioned this pull request Apr 21, 2025

[Feature]: Support custom args in OpenAI (chat) completion requests #16802

Open

afeldman-nm added 6 commits April 22, 2025 06:08

rename

191b9e1

Signed-off-by: Andrew Feldman <[email protected]>

rename

1b658cd

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args

6c892d8

extra_body

6a0f87c

Signed-off-by: Andrew Feldman <[email protected]>

completion custom arg unit test

ac57a7f

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args

9753c75

mergify bot added the v1 label Apr 22, 2025

afeldman-nm added 5 commits April 23, 2025 03:10

Merge branch 'main' into extra_args

c2f39bd

tweak extra_args; test sampling params extra args via api

5c43609

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args

1f8d6d1

remove unnecessary extra_body field/breakout

368f907

Signed-off-by: Andrew Feldman <[email protected]>

removed transcription scenario

a90311a

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm marked this pull request as ready for review April 23, 2025 15:03

comaniac reviewed Apr 23, 2025

View reviewed changes

Merge branch 'main' into extra_args

0e7809d

afeldman-nm mentioned this pull request Apr 25, 2025

[RFC]: Custom sampling params support in REST API #17191

Open

1 task

afeldman-nm added 4 commits May 6, 2025 18:06

Merge branch 'main' into extra_args

510623c

revert sampling params

52988b8

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args

94e5855

impl based on rfc

a869a6d

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args

934de06

mergify bot added the needs-rebase label May 12, 2025

afeldman-nm mentioned this pull request May 12, 2025

[RFC]: Logits processor extensibility #17799

Open

1 task

upstream merge

cf6d7c5

Signed-off-by: Andrew Feldman <[email protected]>

mergify bot removed the needs-rebase label May 13, 2025

afeldman-nm mentioned this pull request May 13, 2025

[V1] LogitsProcessor programming model #16728

Open

afeldman-nm added 2 commits June 3, 2025 14:14

upstream merge

0695f26

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into extra_args_merge

c3047cc

Uh oh!

[V1] vLLM OpenAI API custom args #16862

Are you sure you want to change the base?

[V1] vLLM OpenAI API custom args #16862

Conversation

afeldman-nm commented Apr 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

njhill commented Apr 18, 2025

Uh oh!

afeldman-nm commented Apr 18, 2025

Uh oh!

afeldman-nm commented Apr 23, 2025

Uh oh!

comaniac Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

comaniac Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comaniac Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

comaniac Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm commented Apr 23, 2025

Uh oh!

mergify bot commented May 12, 2025

Uh oh!

helloworld1 commented Jun 3, 2025

Uh oh!

afeldman-nm commented Jun 3, 2025

Uh oh!

helloworld1 commented Jun 5, 2025

Uh oh!

Uh oh!

afeldman-nm commented Apr 18, 2025 •

edited by github-actions bot

Loading

afeldman-nm Apr 23, 2025 •

edited

Loading