You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/features/reasoning_outputs.md
+46-3Lines changed: 46 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,9 @@ Reasoning models return a additional `reasoning_content` field in their outputs,
10
10
11
11
vLLM currently supports the following reasoning models:
12
12
13
-
-[DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) (`deepseek_r1`, which looks for `<think> ... </think>`)
13
+
| Model Series | Parser Name | Structured Output Support |
@@ -78,11 +80,51 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
78
80
79
81
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
80
82
83
+
## Structured output
84
+
85
+
The reasoning content is also available in the structured output. The structured output engine like `xgrammar` will use the reasoning content to generate structured output.
86
+
87
+
```python
88
+
from openai import OpenAI
89
+
from pydantic import BaseModel
90
+
91
+
# Modify OpenAI's API key and API base to use vLLM's API server.
92
+
openai_api_key ="EMPTY"
93
+
openai_api_base ="http://localhost:8000/v1"
94
+
95
+
client = OpenAI(
96
+
api_key=openai_api_key,
97
+
base_url=openai_api_base,
98
+
)
99
+
100
+
models = client.models.list()
101
+
model = models.data[0].id
102
+
103
+
104
+
classPeople(BaseModel):
105
+
name: str
106
+
age: int
107
+
108
+
109
+
json_schema = People.model_json_schema()
110
+
111
+
prompt = ("Generate a JSON with the name and age of one random person.")
The structured output engine like xgrammar will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
214
+
The structured output engine like `xgrammar` will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
172
215
173
216
Finally, you can enable reasoning for the model by using the `--enable-reasoning` and `--reasoning-parser` flags.
0 commit comments