Skip to content

[V1] vLLM OpenAI API custom args #16862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Apr 18, 2025

Add an extra_args: Optional[dict[str, Any]] field to CompletionRequest and ChatCompletionRequest (these are the only OpenAIBaseModel subclasses which had a logits_processors field in v0.) This field is injected into SamplingParams.extra_args via SamplingParams.from_optional(); each dict key/value pair in extra_args becomes an assignment to an attribute of sampling_params.

RFC: #17191

Fixes #16802

Signed-off-by: Andrew Feldman <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Apr 18, 2025
@njhill
Copy link
Member

njhill commented Apr 18, 2025

Thanks @afeldman-nm! It would be good to include a test that shows how these can be passed via the OpenAI client sdk using its extra_body option: https://github.com/openai/openai-python?tab=readme-ov-file#undocumented-request-params

I'm unsure whether we want these new custom args to be in a nested json object (as you've done here) or just extra top-level args.

@afeldman-nm
Copy link
Contributor Author

Thanks @njhill . Agree regarding the unit test. I need to think a bit about the right way to do it

@mergify mergify bot added the v1 label Apr 22, 2025
@afeldman-nm afeldman-nm marked this pull request as ready for review April 23, 2025 15:03
@afeldman-nm
Copy link
Contributor Author

Comment on lines +102 to +104
# Contradictory `max_tokens`
extra_body={
"max_tokens": 5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow conflict fields in extra body as it is very confusing. The "extra_xxx" should literally extra fields that are not defined in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PSA after chatting with Cody: extra_body is entirely a "client-side" feature, i.e. the Python SDK takes an extra_body argument, extracts the key/value pairs, and injects them into the JSON request as if they were top-level arguments. This means that the server never sees extra_body and in fact does not have the ability to handle it (If you use the HTTP client, any arguments in extra_body get ignored.) Another implication of this, is that we have no control over the behavior of extra_body because it is part of OpenAI's client SDK. And as you can see here:

https://github.com/openai/openai-python/blob/ed53107e10e6c86754866b48f8bd862659134ca8/src/openai/resources/models.py#L48

"The extra values given here {in extra_body} take precedence over values defined on the client or passed to this method."

So the support for having conflicting settings in extra_body is not a new feature added by my PR, it is an immutable (to us) characteristic of the SDK. I am simply utilizing this feature as a testing hack to confirm that extra_body is working.

Comment on lines +124 to +128
"ignore_eos": True,
"extra_sampling_params": {
# Contradictory max_tokens
"max_tokens": 5
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively ingore_eos is a part of extra_sampling_params so we may consider moving it in? But this may affect existing users we should be careful. For example we should allow it in both places and show deprecated warning for 1-2 releases. Open to discuss cc @WoosukKwon @njhill

Copy link
Contributor Author

@afeldman-nm afeldman-nm Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to write a short RFC for custom args because I think these details need to be ironed out. In the near-term I think it is fine to allow "special" fields to be set in two ways: as top-level API arguments (current behavior) or via extra_sampling_params (new behavior in this PR.)

In the long term, having two ways to set a special arg such as ignore_eos is confusing & we should probably restrict special args to only be set via extra_sampling_params. However, it is likely that customers depend on the current behavior & their code would be broken by this change. So we should probably wait until a major release (in whatever appropriate sense of the word "major") before restricting special args to be solely configurable via extra_sampling_params.

# Custom sampling params
extra_sampling_params: Optional[dict[str, Any]] = Field(
default=None,
description=("Additional kwargs to pass to sampling."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have more details, such as where to see the doc about extra params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I will clarify. I should probably also add some docs changes (perhaps as a separate PR)

@@ -242,7 +242,6 @@ class SamplingParams(
guided_decoding: Optional[GuidedDecodingParams] = None
logit_bias: Optional[dict[int, float]] = None
allowed_token_ids: Optional[list[int]] = None
extra_args: Optional[dict[str, Any]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to change this? I suppose we only need to change the frontend to take extra args from OpenAI protocol, but the underlying implementation could remain the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and some of the surrounding changes) cause vLLM to automatically unpack all of the key/value pairs in extra_args & try to assign them to members of sampling_params using setattr. This happens inside of from_optional(). Because extra_args gets unpacked into the member variables, there is no need to hold onto extra_args as a member variable, right?

I'll probably modify this behavior, so that only specific "special" arguments that are not part of the openai api (such as ignore_eos) can be set via extra_args.

But regardless - why do we need to hold onto extra_args as a member variable in SamplingParams?

@afeldman-nm
Copy link
Contributor Author

Thanks for your review @comaniac . After chatting with Cody, I think this interface change is sufficiently impactful to merit an RFC which I will write and share shortly.

Copy link

mergify bot commented May 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Feldman <[email protected]>
@helloworld1
Copy link

@afeldman-nm Are there any update on this PR? Have we reached an agreement in the RFC? Thanks!

@afeldman-nm
Copy link
Contributor Author

@afeldman-nm Are there any update on this PR? Have we reached an agreement in the RFC? Thanks!

Hello @helloworld1 , yes the final proposal can be found here: #17191 (comment)

Although the RFC is finalized, this PR is has stalled because it is a bit tricky to unit test custom args when the feature it was designed to support (custom logits processors #17799 based on new logits processor implementation #16728 ) is itself still in development. However the custom args workstream is still WIP and is not abandoned.

@helloworld1
Copy link

@afeldman-nm Glad to see it is still in progress. My use case is passing in truncate_prompt_tokens sampling parameters. Can we just unit test what we can test now and add more comprehensive unit tests when the logits process work is done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support custom args in OpenAI (chat) completion requests
4 participants