-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Model] Add Reasoning Parser for Granite Models #14202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add Reasoning Parser for Granite Models #14202
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
0affa66
to
f1d3367
Compare
Nice use of this new feature! Will try in a bit cc @gaocegege |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Could you please rebase the upstream? In a previous PR to support reasoning outputs in structured outputs https://github.com/vllm-project/vllm/pull/12955/files#diff-ea8b8ff63961713ccb62d78e53e96404b587b7828cb9fee08a9e5576bf563673R1065, we moved the CLI argument --reasoning-parser
to https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L1076
Thus you may need to add a new choice there.
Hi, I updated the docs in this PR #14114 Maybe you should rebase the docs too. Just FYI |
4f4cbe1
to
ecadd2d
Compare
@@ -19,6 +19,10 @@ def get_reasoner(tokenizer: PreTrainedTokenizer, | |||
return None | |||
elif reasoning_backend == "deepseek_r1": | |||
return DeepSeekReasoner.from_tokenizer(tokenizer) | |||
elif reasoning_backend == "granite": | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a warning for now since this is already a large PR, but I think adding a GraniteReasoner
for guided decoding could be a follow-up later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just fail for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarnphm This will behave the same way as models with no reasoner. The intention here was mostly to clarify that there isn't a reasoning backend for granite in case users conflate it with --enable-reasoning
/ --reasoning-parser granite
being supported for these models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wfm.
Awesome thanks @gaocegege! It's been rebased 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🎉 👍
@mgoin Please give it another review, thanks! |
response_start) | ||
reasoning_content = current_text[ | ||
start_reasoning_content:end_reasoning_content] | ||
response_content = current_text[current_chunk_end + 1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_content = current_text[current_chunk_end + 1:] | |
response_content = current_text[current_chunk_end + 1:] | |
parsed_content = True |
parsed_content
flag doesn't seem to be updated, so maybe helpful to set it?
Very minor suggestion, totally optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @b8zhong, thanks for the suggestion! For now, I'd prefer to keep it as is since it returns immediately after parsing the response content. I.e., once this condition is met, there is no need to keep going, so updating the flag won't do anything 🙂
This pull request has merge conflicts that must be resolved before it can be |
@alex-jw-brooks Hi could you please resolve the conflicts |
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny comments,otherwise lgtm.
* The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`). | ||
* It is not compatible with [`tool_calling`](#tool_calling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rever this to reduce code change? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done!
@@ -19,6 +19,10 @@ def get_reasoner(tokenizer: PreTrainedTokenizer, | |||
return None | |||
elif reasoning_backend == "deepseek_r1": | |||
return DeepSeekReasoner.from_tokenizer(tokenizer) | |||
elif reasoning_backend == "granite": | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just fail for now.
Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Alex-Brooks <[email protected]>
d64254f
to
ab83ec1
Compare
Signed-off-by: Alex-Brooks <[email protected]>
a9aeb12
to
a73c789
Compare
Thanks @aarnphm - it's ready for another look when you have a moment 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work!
Hi @mgoin, can you please take a look at this PR when you have a moment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is basically coming from the model vendor, I'll just stamp it. The code looks reasonable to me
Can you fix the merge conflicts? |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Alex-Brooks <[email protected]>
Thanks for the review @DarkLight1337! Sure should be resolved now 🤞 |
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: xinyuxiao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR adds a reasoning parser for Granite 3.2 models! These models have an optional chat template kwarg
thinking
that changes the system prompt to enable reasoning. 😄The format of the text is expected to be:
There have been reports of quantized versions of the model using
Here's
instead ofHere is
though, so this PR matches on both.Examples
Start the server with a granite (3.2) language model that has reasoning and the
granite
parser:Snippets are copied from the docs, with the only change being adding
chat_template_kwargs
withthinking=True
. Without this, reasoning will be disabled, and it'll generally parse everything intocontent
.No streaming:
With streaming:
Example output (run from the streaming snippet above)