-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Frontend] Add backend-specific options for guided decoding #13505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joe Runde <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
vllm/envs.py
Outdated
@@ -585,6 +586,8 @@ def maybe_convert_int(value: Optional[str]) -> Optional[int]: | |||
# specify the path through environment variable VLLM_CUDART_SO_PATH. | |||
"VLLM_CUDART_SO_PATH": | |||
lambda: os.getenv("VLLM_CUDART_SO_PATH", None), | |||
"VLLM_DISABLE_GUIDED_DECODING_FALLBACK": | |||
lambda: bool(int(os.getenv("VLLM_DISABLE_GUIDED_DECODING_FALLBACK", "0"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we an established pattern on what should be config vs env variables? Why wouldn't this be in DecodingConfig
? Maybe we could encode "don't fallback" in something like --guided-decoding-backend=outlines:nofallback
if we were worried about a proliferation of CLI arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either way, but I do think Mark's suggestion would be nicer. I like calling more attention to the --guided-decoding-backend
argument if users want to be explicit about their backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the cli arg approach ... before seeing this comment I was thinking about another "backend" like xgrammar-only
or something like that. xgrammar:nofallback
leaves it open to a bit more flexibility to specify additional options if necessary later, like xgrammar:nofallback,json-any-whitespace
to support the case covered in #12744
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we an established pattern on what should be config vs env variables?
Yeah ... I keep thinking about this. It's going to be a big project, but we're due for significant cleanup here. I'd really like a system that supports both config files and command line args (and less env vars unless it's just an alternative for setting the same set of options).
... but I have no idea when that's going to feel like the most important thing to work on!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we an established pattern on what should be config vs env variables?
Yeah ... I keep thinking about this.
Me too. I like to think that the environment variables change the 'behavior' of the system like using a deprecated | experimental | workaround feature. While config are the others 'common features' of the system that's up to the users to set or tune to their environment.
However there also something that makes sense to this discussion. When we set in the config we have a chance to log the system setup, like the log Initializing a V0 LLM engine (v%s) with config: [...]
Sometimes it is tricky to get the exact setup of the system when we got a crash and the only thing that we get it is a stack trace (which may be truncated as well 😄) .
Probably we should prefer using args before envs , but when makes sense to use envs, we probably could log to the users (at least once) that vLLM has this feature on and the implications of that in the system.
Either way: I like the idea of --guided-decoding-backend=outlines:nofallback
for this PR. And I'm pretty sure that it would be logged in the system initialization, which is nice for debugging purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion everybody 👍
I stuck this in the environment to avoid the proliferation of cli args, but I love the suggestion of encoding the fallback behavior in the name of the backend. Best of both worlds!
I'll update the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is updated if y'all wanna take a second look 🙏
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
|
||
with pytest.raises( | ||
ValueError, | ||
match="xgrammar does not support regex guided decoding"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will soon! :-). #13228
This is totally fine, though. I can tweak this once we're able to turn it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great to me! Thank you!
It would be good to update the text that shows up under vllm serve --help
as well.
...
--guided-decoding-backend {outlines,lm-format-enforcer,xgrammar}
Which engine will be used for guided decoding (JSON schema / regex etc) by default. Currently
support https://github.com/outlines-dev/outlines, https://github.com/mlc-ai/xgrammar, and
https://github.com/noamgat/lm-format-enforcer. Can be overridden per request via
guided_decoding_backend parameter.
...
Signed-off-by: Joe Runde <[email protected]>
@russellb good catch, the CLI parser needed a bit of relaxing too. @wallashss This now boots and logs the backend and option(s):
|
Signed-off-by: Wallas Santos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, LGTM!
…ject#13505) Signed-off-by: Joe Runde <[email protected]>
…ject#13505) Signed-off-by: Joe Runde <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…ject#13505) Signed-off-by: Joe Runde <[email protected]>
This PR extends the guided decoding backend name to support a list of backend-specific options. These are specified in a comma separated list after a colon. This allows us to easily add support for backend-specific features, like
xgrammar:json-any-whitespace
.Specifically this PR also implements the
no-fallback
option. When set vLLM will return a very nicely formatted 400 describing how the guided decoding backend doesn't support what they asked for, instead of silently switching to a different backend. This is useful for users who want to only support a single guided decoding backend.A motivation for this was that a product that only intended to support the xgrammar backend had a user find a query that would fall back to outlines, and then cause outlines to hang indefinitely. (See dottxt-ai/outlines-core#180)
Works around #12005 😉