Skip to content

Commit 1711b92

Browse files
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Joe Runde <[email protected]>
1 parent c091c0a commit 1711b92

File tree

8 files changed

+730
-3
lines changed

8 files changed

+730
-3
lines changed

docs/source/features/reasoning_outputs.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
vLLM offers support for reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), which are designed to generate outputs containing both reasoning steps and final conclusions.
66

7-
Reasoning models return a additional `reasoning_content` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models.
7+
Reasoning models return an additional `reasoning_content` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models.
88

99
## Supported Models
1010

@@ -14,6 +14,9 @@ vLLM currently supports the following reasoning models:
1414
|--------------|-------------|------------------|-------------|
1515
| [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) | `deepseek_r1` | `guided_json`, `guided_regex` ||
1616
| [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | `deepseek_r1` | `guided_json`, `guided_regex` ||
17+
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` |||
18+
19+
- IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.
1720

1821
## Quickstart
1922

@@ -43,6 +46,7 @@ model = models.data[0].id
4346

4447
# Round 1
4548
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
49+
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
4650
response = client.chat.completions.create(model=model, messages=messages)
4751

4852
reasoning_content = response.choices[0].message.reasoning_content
@@ -97,6 +101,7 @@ models = client.models.list()
97101
model = models.data[0].id
98102

99103
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
104+
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
100105
stream = client.chat.completions.create(model=model,
101106
messages=messages,
102107
stream=True)

examples/online_serving/openai_chat_completion_with_reasoning.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131

3232
# Round 1
3333
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
34+
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
3435
response = client.chat.completions.create(model=model, messages=messages)
3536

3637
reasoning_content = response.choices[0].message.reasoning_content

examples/online_serving/openai_chat_completion_with_reasoning_streaming.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
model = models.data[0].id
3939

4040
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
41+
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
4142
stream = client.chat.completions.create(model=model,
4243
messages=messages,
4344
stream=True)

0 commit comments

Comments
 (0)