-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bug]: DeepSeek R1 with outlines structured engine stops generation after </think>
#14113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
</think>
You may also be interested in this issue. /cc @jacobthebanana @liuyanyi Maybe outlines added an eos token for it. Haven't dived deep into it. |
I have tested the same example on my device, and have got a The logs are shown below: reasoning_content: Okay, so I just got this query where演奏 wantsiliate a JSON with the brand It, model, and car_type of the most iconic car fromconomy, 90s. They even asked me to limit it to 100 tokens. Hmm, okay, let me break it down.
First, I need to figure out which iconic car was iconic in the '90s. My mind jumps to time-traveling when I'm imagining cars from the future, but I need to think back to the 90s. The亲人iv is definitely iconic, especially for its sleek design and VR simulation features. Wow, that's a mouthful. But maybe I don't need to explain it extensively. Just the key details about the brand, model, and car_type should suffice.
Wait, the query says "cryptocurrency? Or perhaps it's a play on words for a crypto car? But the user mentioned "iconic car from the 90s," so I think it's safe to go with国产 vehicles. So, Exchange Collaboration Chains are from 1995, so nope, not crypto anymore. So the original meaning was correct.
Breaking it down: Elegant &这项wise brings phones & versatile cars aroundModifiedDate 1000. So, model could be Elegant Exchange Collaboration Chains. Brand would be Elegant, car_type would be Car.
Putting it together: {"brand": "Elegant", "model": "Exchange Collaboration Chains", "car_type": "Car"}. Let me count the tokens. Brand:5, model:6, car_type:4. Wait, that's 15 tokens. Hmm, maybe I should try again.
Wait a minute, maybe the model name can be expanded. Exchange Collaboration Chains sounds like a brand since Contribution Chains were created隐蔽 Way. But that's still part of et pricing. Alternatively, the model could be DesulGeneral Exchange Collaboration Chains. That might help. Or perhaps Exchange Collaboration Chains is a specific model.
Wait, but the user wants to limit to 100 tokens. Let me make sure each field is exactly one to two words or phrases. So, "brand" is 2 words, "model" is 5, "car_type" is 2. Total of 9 words, which is way below 100. But maybe the user has a more literal definition in mind.
Alternatively, thinking creatively, maybe "most iconic car" refers to something else. But, I don't think there's another car from the 90s as iconic as Exchange Collaboration Chains. Payment chains from Google were certainly in the 90s, but Exchange Collaboration Chains sounds like it's more Russian, perhaps.
Another thought: maybe Exchange Chains were actually phones from the 90s, but that seems less likely. So, I stick with Elegant Exchange Collaboration Chains. Got it.
content None |
Yes, I’m able to reproduce the issue. In this example, the outlines engine is being used, and the token generation stops at I think it is caused by our implementation. I can get the result successfully several weeks ago, with my first implementation in the PR #12955. I had a fix to disable reasoning outputs with outlines request in #14114, and will try to figure out why outlines is not supported. |
It is not from your side, or NPU. I can reproduce it with GPU. |
@shen-shanshan, you could check out this PR and test the provided example to verify whether it works on the NPU. #14114 |
I have tested this PR on my device and it worked well with xgrammar backend~ |
OK, thanks for your test, I will work on the fix for outlines. |
@shen-shanshan I fixed the You could test your NPU with it if you like. |
Hi @gaocegege, # start by `vllm serve ${qwq_path} --enable-reasoning --reasoning-parser deepseek_r1`
class CarDescription(BaseModel):
brand: str
model: str
completion = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant designed to output JSON."},
{"role": "user", "content": prompt},
],
extra_body={
"guided_json": CarDescription.model_json_schema(),
}
)
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
print("content: ", completion.choices[0].message.content)
# output:
# reasoning_content: json response
# content: None |
What's the output? I tried this, and it works. # Guided decoding by JSON using Pydantic schema
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
json_schema = CarDescription.model_json_schema()
prompt = ("Generate a JSON with the brand, model and car_type of"
"the most iconic car from the 90's")
completion = client.chat.completions.create(
model=model,
messages=[{
"role": "user",
"content": prompt,
}],
extra_body={"guided_json": json_schema},
)
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
print("content: ", completion.choices[0].message.content) |
When I implement your code, I encounter the following error:
And, when I exclude car_type, the result I receive is:
|
|
Thank you. Setting
|
Uh oh!
There was an error while loading. Please reload this page.
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
Runs with this example. It falls back to outlines engine because it has Enum in the JSON schema (Our xgrammar does not support this), and will generate token ids until . Then it stops. Did not encounter it with xgrammar.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: