-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bug]: Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start <think> and can not parse reasoning_content. #13125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
same problem to me |
modify the last part of "chat_template" in the model file tokenizer_config.json from {{'<|Assistant|>\n'}}{% endif %} to {{'<|Assistant|>'}}{% endif %} |
在下载的deepseek模型tokenizer_config.json修改聊天模板,将后面的 但是注意模型的生成内容将会经常不思考,要从其他地方提示模型要思考。 |
Thanks,I used the tokenizer_config.json of the 14B model and the problem was solved. |
感谢,我换成14B的tokenizer_config.json,也能出现两个think标签了。 |
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + ' |
We are facing exactly the same issue but with the https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B model. a) Removing the \n from Chat Template did not work.
b) Using Chat Template from Distilled Qwen 14B Model also did not worked. Does anyone has an idea how to get it to work ? |
这是我的对话模板:
我把对话模板设置为这个后, 要看是否生效,在vllm的日志中查看发送的请求: |
We would recommend to check the Chat Template if the token is prefilled. |
Yes, all deepseek's chat templates for the deepseek r1 model on huggface have forced the addition of |
Removing the \n would help here to output the token ? |
No, because That is, you must delete |
Exactly this we did and changed to delete /n, but it was not working either. |
I typed the wrong symbol, Delete the |
We did exactly this and deleted the token in the end. Run the request with:
Content starts with: "content": "Alright, the user just called vLLM "wonderful." I should respond in a way that's appreciative and
|
Please send me your chat template and vllm log |
A work around solution: |
Thanks for you reply, but this would not solve the parser issue in vLLM. |
Please send your chat template and try to request the vllm server once, recording the vllm log sent out. So I can see where you're going wrong. |
Chat Template: chat_tmpl_wo_think.jinja
VLLM Parameters:
Query:
VLLM Response:
VLLM Logs:
|
try to use this chat template
|
Thank you for the Chat Template.
|
Through your log:
I observed that |
I added the recommended prompt from the DeepSeek Model Repo: CURL:
Result:
|
You have encountered a rather unusual issue. To help you resolve it, I have identified two core reasons. First, there are inherent issues with the DeepSeek R1 model when actively generating the <think> opening tag. This has forced the DeepSeek team to manually insert the <think> opening tag into the conversation template. Second, the <think> opening tag appears in the wrong place—it appears in the user input rather than the model output. The parser responsible for processing API return results, as well as vLLM, which recognizes <think></think> tags as "thought text," both read the model's output. As a result, they fail to detect the <think> opening tag. The best solution is to modify the conversation template so that user input does not automatically trigger the <think> opening tag. This way, the model can correctly use the <think> tag for generation, and everything will function as expected. However, I am not entirely sure about the root cause of the issue—whether it stems from an adjustment in the Llama-70B variant or some other factor. In your tests, the <think> tag never seemed to appear spontaneously in the model's output. I have done my best to analyze and help resolve this issue. I hope this final analysis is helpful to you. Best of luck! |
Thank you very much for your response. Very kind of you.
Yes exactly, this is the issue. |
@ghyaauoo @ch9hn @Baiyizhe @cwingho @Sariel2
|
You can try this approach here, if you have success with it: |
@ch9hn It seems weird. Maybe modifying the implementation about reasoning content parsing of vllm 0.7.2 could be helpful to you: This is the implementation of vllm 0.7.2: # vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py
def extract_reasoning_content(
self, model_output: str, request: ChatCompletionRequest
) -> Tuple[Optional[str], Optional[str]]:
# Check if the model output contains the <think> tokens.
if (self.think_start_token not in model_output
or self.think_end_token not in model_output):
return None, model_output
...
vllm/vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py Lines 123 to 147 in b3942e1
If you want to make it suppport streaming also, you should fix the streaming parser. You can refer to #13025 |
I tried this chat template and now I can see the reasoning content, however I there is example:
chat-template:
|
With Deepseek-r1-Distill-Qwen-32B experiment, when removing "" tag from chat template, model will skip reasoning for some problem that seem 'easy'. You may try some complex question, like a multi-hop question: 'What is the municipality directly under the central government adjacent to the capital of China?', the reasoning_content should contain something then (wish this can help you |
Hi, I just want to confirm if the issue has been resolved in the latest code with #13025? Please try 0.7.3 and the latest model from huggingface. It should be resolved. |
Thank you for your reply, actually I do remember having my vllm updated to 0.7.3, but I forget if I test it, maybe its okay now |
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start and can not parse reasoning_content.
'''bash
CUDA_VISIBLE_DEVICES=0 nohup
python -m vllm.entrypoints.openai.api_server
--model /data/models/DeepSeek-R1-Distill-Qwen-32B/
--trust-remote-code
--served-model-name deepseek-32b
--tensor-parallel-size 1
--gpu-memory-utilization 0.80
--max-model-len 3000
--dtype bfloat16
--enable-reasoning
--reasoning-parser deepseek_r1
--enforce-eager
--port 10009 >log_vllm_deepseek32b.log 2>&1 &
'''
Then, curl
{"model":"deepseek-32b","stream": false,"top_k":-1,"top_p": 0.95,"temperature": 0.6,"repetition_penalty": 1.0,"messages": [{"role": "user", "content": "我想买猫粮,预算2000"}]}
Result:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: