Skip to content

Commit b0cdfed

Browse files
committed
Update the docs
Signed-off-by: Ce Gao <[email protected]>
1 parent 002848a commit b0cdfed

File tree

1 file changed

+46
-3
lines changed

1 file changed

+46
-3
lines changed

docs/source/features/reasoning_outputs.md

Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ Reasoning models return a additional `reasoning_content` field in their outputs,
1010

1111
vLLM currently supports the following reasoning models:
1212

13-
- [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) (`deepseek_r1`, which looks for `<think> ... </think>`)
13+
| Model Series | Parser Name | Structured Output Support |
14+
|--------------|-------------|------------------|
15+
| [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) | `deepseek_r1` | `guided_json`, `guided_regex` |
1416

1517
## Quickstart
1618

@@ -78,11 +80,51 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
7880

7981
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
8082

83+
## Structured output
84+
85+
The reasoning content is also available in the structured output. The structured output engine like `xgrammar` will use the reasoning content to generate structured output.
86+
87+
```python
88+
from openai import OpenAI
89+
from pydantic import BaseModel
90+
91+
# Modify OpenAI's API key and API base to use vLLM's API server.
92+
openai_api_key = "EMPTY"
93+
openai_api_base = "http://localhost:8000/v1"
94+
95+
client = OpenAI(
96+
api_key=openai_api_key,
97+
base_url=openai_api_base,
98+
)
99+
100+
models = client.models.list()
101+
model = models.data[0].id
102+
103+
104+
class People(BaseModel):
105+
name: str
106+
age: int
107+
108+
109+
json_schema = People.model_json_schema()
110+
111+
prompt = ("Generate a JSON with the name and age of one random person.")
112+
completion = client.chat.completions.create(
113+
model=model,
114+
messages=[{
115+
"role": "user",
116+
"content": prompt,
117+
}],
118+
extra_body={"guided_json": json_schema},
119+
)
120+
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
121+
print("content: ", completion.choices[0].message.content)
122+
```
123+
81124
## Limitations
82125

83126
- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
84127
- It is not compatible with [`tool_calling`](#tool_calling).
85-
- The reasoning content is not available for all models. Check the model's documentation to see if it supports reasoning.
86128

87129
## How to support a new reasoning model
88130

@@ -166,9 +208,10 @@ class DeepSeekReasoner(Reasoner):
166208

167209
def is_reasoning_end(self, input_ids: list[int]) -> bool:
168210
return self.end_token_id in input_ids
211+
...
169212
```
170213

171-
The structured output engine like xgrammar will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
214+
The structured output engine like `xgrammar` will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
172215

173216
Finally, you can enable reasoning for the model by using the `--enable-reasoning` and `--reasoning-parser` flags.
174217

0 commit comments

Comments
 (0)