-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Frontend] Add backend-specific options for guided decoding #13505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
65cff01
:wrench: Add env var to disable guided decoding fallbacks
joerunde c971da7
:rewind: revert envs change
joerunde 15cac0c
:sparkles: add guided decoding backend options
joerunde f9d0e9d
:bug: handle missing backend name
joerunde c64df44
:bug: fixup options
joerunde a8e73c3
:memo: add docs and example
joerunde 85b1558
:sparkles: add CLI support
joerunde File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we an established pattern on what should be config vs env variables? Why wouldn't this be in
DecodingConfig
? Maybe we could encode "don't fallback" in something like--guided-decoding-backend=outlines:nofallback
if we were worried about a proliferation of CLI arguments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either way, but I do think Mark's suggestion would be nicer. I like calling more attention to the
--guided-decoding-backend
argument if users want to be explicit about their backendThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the cli arg approach ... before seeing this comment I was thinking about another "backend" like
xgrammar-only
or something like that.xgrammar:nofallback
leaves it open to a bit more flexibility to specify additional options if necessary later, likexgrammar:nofallback,json-any-whitespace
to support the case covered in #12744There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ... I keep thinking about this. It's going to be a big project, but we're due for significant cleanup here. I'd really like a system that supports both config files and command line args (and less env vars unless it's just an alternative for setting the same set of options).
... but I have no idea when that's going to feel like the most important thing to work on!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. I like to think that the environment variables change the 'behavior' of the system like using a deprecated | experimental | workaround feature. While config are the others 'common features' of the system that's up to the users to set or tune to their environment.
However there also something that makes sense to this discussion. When we set in the config we have a chance to log the system setup, like the log
Initializing a V0 LLM engine (v%s) with config: [...]
Sometimes it is tricky to get the exact setup of the system when we got a crash and the only thing that we get it is a stack trace (which may be truncated as well 😄) .Probably we should prefer using args before envs , but when makes sense to use envs, we probably could log to the users (at least once) that vLLM has this feature on and the implications of that in the system.
Either way: I like the idea of
--guided-decoding-backend=outlines:nofallback
for this PR. And I'm pretty sure that it would be logged in the system initialization, which is nice for debugging purpose.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion everybody 👍
I stuck this in the environment to avoid the proliferation of cli args, but I love the suggestion of encoding the fallback behavior in the name of the backend. Best of both worlds!
I'll update the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is updated if y'all wanna take a second look 🙏