-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[V1] vLLM OpenAI API custom args #16862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[V1] vLLM OpenAI API custom args #16862
Conversation
Signed-off-by: Andrew Feldman <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Thanks @afeldman-nm! It would be good to include a test that shows how these can be passed via the OpenAI client sdk using its I'm unsure whether we want these new custom args to be in a nested json object (as you've done here) or just extra top-level args. |
Thanks @njhill . Agree regarding the unit test. I need to think a bit about the right way to do it |
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
# Contradictory `max_tokens` | ||
extra_body={ | ||
"max_tokens": 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should allow conflict fields in extra body as it is very confusing. The "extra_xxx" should literally extra fields that are not defined in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSA after chatting with Cody: extra_body
is entirely a "client-side" feature, i.e. the Python SDK takes an extra_body
argument, extracts the key/value pairs, and injects them into the JSON request as if they were top-level arguments. This means that the server never sees extra_body
and in fact does not have the ability to handle it (If you use the HTTP client, any arguments in extra_body
get ignored.) Another implication of this, is that we have no control over the behavior of extra_body
because it is part of OpenAI's client SDK. And as you can see here:
"The extra values given here {in extra_body
} take precedence over values defined on the client or passed to this method."
So the support for having conflicting settings in extra_body
is not a new feature added by my PR, it is an immutable (to us) characteristic of the SDK. I am simply utilizing this feature as a testing hack to confirm that extra_body
is working.
"ignore_eos": True, | ||
"extra_sampling_params": { | ||
# Contradictory max_tokens | ||
"max_tokens": 5 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively ingore_eos is a part of extra_sampling_params so we may consider moving it in? But this may affect existing users we should be careful. For example we should allow it in both places and show deprecated warning for 1-2 releases. Open to discuss cc @WoosukKwon @njhill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to write a short RFC for custom args because I think these details need to be ironed out. In the near-term I think it is fine to allow "special" fields to be set in two ways: as top-level API arguments (current behavior) or via extra_sampling_params
(new behavior in this PR.)
In the long term, having two ways to set a special arg such as ignore_eos
is confusing & we should probably restrict special args to only be set via extra_sampling_params
. However, it is likely that customers depend on the current behavior & their code would be broken by this change. So we should probably wait until a major release (in whatever appropriate sense of the word "major") before restricting special args to be solely configurable via extra_sampling_params
.
# Custom sampling params | ||
extra_sampling_params: Optional[dict[str, Any]] = Field( | ||
default=None, | ||
description=("Additional kwargs to pass to sampling."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to have more details, such as where to see the doc about extra params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I will clarify. I should probably also add some docs changes (perhaps as a separate PR)
vllm/sampling_params.py
Outdated
@@ -242,7 +242,6 @@ class SamplingParams( | |||
guided_decoding: Optional[GuidedDecodingParams] = None | |||
logit_bias: Optional[dict[int, float]] = None | |||
allowed_token_ids: Optional[list[int]] = None | |||
extra_args: Optional[dict[str, Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to change this? I suppose we only need to change the frontend to take extra args from OpenAI protocol, but the underlying implementation could remain the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and some of the surrounding changes) cause vLLM to automatically unpack all of the key/value pairs in extra_args
& try to assign them to members of sampling_params
using setattr
. This happens inside of from_optional()
. Because extra_args
gets unpacked into the member variables, there is no need to hold onto extra_args
as a member variable, right?
I'll probably modify this behavior, so that only specific "special" arguments that are not part of the openai api (such as ignore_eos
) can be set via extra_args
.
But regardless - why do we need to hold onto extra_args
as a member variable in SamplingParams
?
Thanks for your review @comaniac . After chatting with Cody, I think this interface change is sufficiently impactful to merit an RFC which I will write and share shortly. |
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm Are there any update on this PR? Have we reached an agreement in the RFC? Thanks! |
Hello @helloworld1 , yes the final proposal can be found here: #17191 (comment) Although the RFC is finalized, this PR is has stalled because it is a bit tricky to unit test custom args when the feature it was designed to support (custom logits processors #17799 based on new logits processor implementation #16728 ) is itself still in development. However the custom args workstream is still WIP and is not abandoned. |
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm Glad to see it is still in progress. My use case is passing in |
Add an
extra_args: Optional[dict[str, Any]]
field toCompletionRequest
andChatCompletionRequest
(these are the only OpenAIBaseModel subclasses which had alogits_processors
field in v0.) This field is injected intoSamplingParams.extra_args
viaSamplingParams.from_optional()
; each dict key/value pair inextra_args
becomes an assignment to an attribute ofsampling_params
.RFC: #17191
Fixes #16802