[Bug]: Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start <think> and can not parse reasoning_content. #13125

xiaoyan88 · 2025-02-12T04:34:56Z

Your current environment

The output of `python collect_env.py`

INFO 02-12 04:25:46 __init__.py:190] Automatically detected platform cuda.
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.11.4 (main, Jul  5 2023, 13:45:01) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-177-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A800 80GB PCIe
GPU 1: NVIDIA A800 80GB PCIe

Nvidia driver version: 535.161.07
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      52 bits physical, 57 bits virtual
CPU(s):                             96
On-line CPU(s) list:                0-95
Thread(s) per core:                 2
Core(s) per socket:                 24
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              106
Model name:                         Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz
Stepping:                           6
Frequency boost:                    enabled
CPU MHz:                            864.135
CPU max MHz:                        2101.0000
CPU min MHz:                        800.0000
BogoMIPS:                           4200.00
Virtualization:                     VT-x
L1d cache:                          2.3 MiB
L1i cache:                          1.5 MiB
L2 cache:                           60 MiB
L3 cache:                           72 MiB
NUMA node0 CPU(s):                  0-23,48-71
NUMA node1 CPU(s):                  24-47,72-95
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear pconfig flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.48.2
[pip3] triton==3.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-ml-py              12.570.86                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] pyzmq                     26.2.1                   pypi_0    pypi
[conda] torch                     2.5.1                    pypi_0    pypi
[conda] torchaudio                2.5.1                    pypi_0    pypi
[conda] torchvision               0.20.1                   pypi_0    pypi
[conda] transformers              4.48.2                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.7.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    NIC0    NIC1    NIC2    NIC3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      SYS     PXB     PXB     SYS     SYS     0-23,48-71      0               N/A
GPU1    SYS      X      SYS     SYS     PIX     PIX     24-47,72-95     1               N/A
NIC0    PXB     SYS      X      PIX     SYS     SYS
NIC1    PXB     SYS     PIX      X      SYS     SYS
NIC2    SYS     PIX     SYS     SYS      X      PIX
NIC3    SYS     PIX     SYS     SYS     PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3

NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY

🐛 Describe the bug

Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start and can not parse reasoning_content.

'''bash
CUDA_VISIBLE_DEVICES=0 nohup
python -m vllm.entrypoints.openai.api_server
--model /data/models/DeepSeek-R1-Distill-Qwen-32B/
--trust-remote-code
--served-model-name deepseek-32b
--tensor-parallel-size 1
--gpu-memory-utilization 0.80
--max-model-len 3000
--dtype bfloat16
--enable-reasoning
--reasoning-parser deepseek_r1
--enforce-eager
--port 10009 >log_vllm_deepseek32b.log 2>&1 &
'''

Then, curl
{"model":"deepseek-32b","stream": false,"top_k":-1,"top_p": 0.95,"temperature": 0.6,"repetition_penalty": 1.0,"messages": [{"role": "user", "content": "我想买猫粮，预算2000"}]}

Result:

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

Sariel2 · 2025-02-12T07:47:53Z

same problem to me

Baiyizhe · 2025-02-12T08:53:31Z

modify the last part of "chat_template" in the model file tokenizer_config.json from {{'<｜Assistant｜>\n'}}{% endif %} to {{'<｜Assistant｜>'}}{% endif %}

fenghan0430 · 2025-02-12T09:41:05Z

在下载的deepseek模型tokenizer_config.json修改聊天模板，将后面的 <think>//n 删除可以解决问题。

但是注意模型的生成内容将会经常不思考，要从其他地方提示模型要思考。

xiaoyan88 · 2025-02-12T09:57:17Z

modify the last part of "chat_template" in the model file tokenizer_config.json from {{'<｜Assistant｜>\n'}}{% endif %} to {{'<｜Assistant｜>'}}{% endif %}

Thanks，I used the tokenizer_config.json of the 14B model and the problem was solved.

xiaoyan88 · 2025-02-12T09:58:21Z

在下载的deepseek模型tokenizer_config.json修改聊天模板，将后面的//n删除可以解决问题。

但是注意模型的生成内容将会经常不思考，要从其他地方提示模型要思考。

感谢，我换成14B的tokenizer_config.json，也能出现两个think标签了。

ghyaauoo · 2025-02-12T16:24:42Z

"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + 'json' + '\\n' + tool['function']['arguments'] + '\\n' + '' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + 'json' + '\\n' + tool['function']['arguments'] + '\\n' + '' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '' in content %}{% set content = content.split('')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}"
怎么改。没效果啊

ch9hn · 2025-02-12T21:39:27Z

We are facing exactly the same issue but with the https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B model.
The above solutions we tried out, but had no success.

a) Removing the \n from Chat Template did not work.

{% if ns.is_tool %}
    {{'<｜tool▁outputs▁end｜>'}}
{% endif %}
{% if add_generation_prompt and not ns.is_tool %}
## {{'<｜Assistant｜><think>\\n'}}
    {{'<｜Assistant｜>'}}

b) Using Chat Template from Distilled Qwen 14B Model also did not worked.

Does anyone has an idea how to get it to work ?

fenghan0430 · 2025-02-13T08:47:59Z

这是我的对话模板：

{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}

我把对话模板设置为这个后，<think></think>标签对正常输出，vllm使用--enable-reasoning --reasoning-parser deepseek_r1也可以正常解析思考内容。

要看是否生效，在vllm的日志中查看发送的请求：

ch9hn · 2025-02-13T10:04:00Z

We would recommend to check the Chat Template if the token is prefilled.
If this is the case, then we could parse between the start of the response and the end token.

fenghan0430 · 2025-02-13T10:37:58Z

We would recommend to check the Chat Template if the token is prefilled. If this is the case, then we could parse between the start of the response and the end token.

Yes, all deepseek's chat templates for the deepseek r1 model on huggface have forced the addition of <think>/n after user input, but I would recommend manually modifying the chat template.

ch9hn · 2025-02-13T10:42:32Z

Removing the \n would help here to output the token ?

fenghan0430 · 2025-02-13T10:53:40Z

Removing the \n would help here to output the token ?

No, because user input <think>/n belongs to the input part of the model, while vllm resolves only the output part of the model. Since <think>/n appears in the input part of the model, the subsequent generation of the model will not contain the <think> start label, only the </think> close label will be generated.

That is, you must delete <think>/n from the chat template, not just /n

ch9hn · 2025-02-13T12:03:54Z

Exactly this we did and changed to delete /n, but it was not working either.

fenghan0430 · 2025-02-13T12:31:29Z

Exactly this we did and changed to delete /n, but it was not working either.

I typed the wrong symbol, Delete the <think>/n at the end of the chat template

ch9hn · 2025-02-13T22:25:35Z

We did exactly this and deleted the token in the end.

Run the request with:

{
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "vLLM is wonderful!"
        }
      ]
    }
  ]
}

Content starts with:

"content": "Alright, the user just called vLLM "wonderful." I should respond in a way that's appreciative and

{
  "id": "chatcmpl-1003f13d474047f89ae5f5c4ef82e405",
  "object": "chat.completion",
  "created": 1739485417,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Alright, the user just called vLLM \"wonderful.\" I should respond in a way that's appreciative and welcoming.\n\nI want to thank them for their kind words. Maybe ask how I can assist them today to keep the conversation going.\n\nKeeping it friendly and open-ended seems like the best approach.\n</think>\n\nThank you! I'm here to help. How can I assist you today?",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 94,
    "completion_tokens": 82,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

fenghan0430 · 2025-02-14T00:37:11Z

Please send me your chat template and vllm log

cwingho · 2025-02-14T04:07:15Z

We did exactly this and deleted the token in the end.

Run the request with:

{
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "vLLM is wonderful!"
        }
      ]
    }
  ]
}

Content starts with:

"content": "Alright, the user just called vLLM "wonderful." I should respond in a way that's appreciative and

{
  "id": "chatcmpl-1003f13d474047f89ae5f5c4ef82e405",
  "object": "chat.completion",
  "created": 1739485417,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Alright, the user just called vLLM \"wonderful.\" I should respond in a way that's appreciative and welcoming.\n\nI want to thank them for their kind words. Maybe ask how I can assist them today to keep the conversation going.\n\nKeeping it friendly and open-ended seems like the best approach.\n</think>\n\nThank you! I'm here to help. How can I assist you today?",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 94,
    "completion_tokens": 82,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

A work around solution:
No need to modify the chat template, just add the <think> to the head of the response in your code if current model == deepseek r1

ch9hn · 2025-02-14T08:32:11Z

Thanks for you reply, but this would not solve the parser issue in vLLM.

fenghan0430 · 2025-02-14T08:39:46Z

Thanks for you reply, but this would not solve the parser issue in vLLM.

Please send your chat template and try to request the vllm server once, recording the vllm log sent out.

So I can see where you're going wrong.

ch9hn · 2025-02-14T10:47:27Z

Chat Template: chat_tmpl_wo_think.jinja

{% if not add_generation_prompt is defined %}
    {% set add_generation_prompt = false %}
{% endif %}
{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
{%- for message in messages %}
    {%- if message['role'] == 'system' %}
        {% set ns.system_prompt = message['content'] %}
    {%- endif %}
{%- endfor %}
{{bos_token}}
{{ns.system_prompt}}
{%- for message in messages %}
    {%- if message['role'] == 'user' %}
        {%- set ns.is_tool = false -%}
        {{'<｜User｜>' + message['content']}}
    {%- endif %}
    {%- if message['role'] == 'assistant' and message['content'] is none %}
        {%- set ns.is_tool = false -%}
        {%- for tool in message['tool_calls']%}
            {%- if not ns.is_first %}
                {{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}
                {%- set ns.is_first = true -%}
            {%- else %}
                {{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}
                {{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}
            {%- endif %}
        {%- endfor %}
    {%- endif %}
    {%- if message['role'] == 'assistant' and message['content'] is not none %}
        {%- if ns.is_tool %}
            {{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}
            {%- set ns.is_tool = false -%}
        {%- else %}
            {% set content = message['content'] %}
            {% if '</think>' in content %}
                {% set content = content.split('</think>')[-1] %}
            {% endif %}
            {{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}
        {%- endif %}
    {%- endif %}
    {%- if message['role'] == 'tool' %}
        {%- set ns.is_tool = true -%}
        {%- if ns.is_output_first %}
            {{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
            {%- set ns.is_output_first = false %}
        {%- else %}
            {{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
        {%- endif %}
    {%- endif %}
{%- endfor -%}
{% if ns.is_tool %}
    {{'<｜tool▁outputs▁end｜>'}}
{% endif %}
{% if add_generation_prompt and not ns.is_tool %}
    {{'<｜Assistant｜>'}}
{% endif %}

VLLM Parameters:
CHAT_TEMPLATE=/vllm-workspace/chat-templates/chat_tmpl_wo_think.jinja
Version: vllm/vllm-openai:v0.7.2

              python3 -m vllm.entrypoints.openai.api_server
              --host "0.0.0.0"
              --port "$(PORT)"
              --trust-remote-code
              --tensor-parallel-size "$(NUM_GPU)"
              --download-dir "/mnt/llm-models"
              --model "$(MODEL_REPO)/$(MODEL_NAME)"
              --served-model-name "$(MODEL_REPO)/$(MODEL_NAME)"
              --gpu-memory-utilization "$(MEMORY_UTIL)"
              --max-model-len "$(MODEL_LEN)"
              --enable-prefix-caching
              --chat-template "$(CHAT_TEMPLATE)"
              --enable-reasoning
              --reasoning-parser deepseek_r1

Query:
POST /v1/chat/completions

{
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "vLLM is wonderful!"
        }
      ]
    }
  ]
}

VLLM Response:

{
  "id": "chatcmpl-3603b8c012a94cf9884fdfa8bea15a56",
  "object": "chat.completion",
  "created": 1739529879,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Alright, the user said \"vLLM is wonderful!\" with a lot of exclamation points. That's super enthusiastic!\n\nI should match their energy and be excited too. Let's say thanks and agree that vLLM is great.\n\nMaybe add a line about how it helps with creative thinking and problem-solving. Show genuine appreciation for their support.\n\nKeep it friendly and upbeat to keep the positive vibe going!\n</think>\n\nI completely agree! vLLM is indeed wonderful, and I'm so glad you're enjoying it! It's an amazing tool for creative thinking, problem-solving, and learning new things. Let me know if there's anything I can help you with today—I'm excited to assist!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 154,
    "completion_tokens": 142,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

VLLM Logs:

INFO 02-14 02:44:39 logger.py:39] Received request chatcmpl-3603b8c012a94cf9884fdfa8bea15a56: prompt: '<｜begin▁of▁sentence｜>\n<｜User｜>vLLM is wonderful! <｜Assistant｜>\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16372, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

fenghan0430 · 2025-02-14T11:25:54Z

try to use this chat template

{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}

ch9hn · 2025-02-14T11:46:49Z

Thank you for the Chat Template.
Got the following response now - looks like the model is not thinking anymore ? But the parsing works.

{
  "id": "chatcmpl-ad6e05fb3b434f8491f1f29a540f089c",
  "object": "chat.completion",
  "created": 1739533541,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "\n\n",
        "content": "\n\nThank you! 😊 I'm glad you find me helpful. If you have any questions or need assistance with something, feel free to ask!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 43,
    "completion_tokens": 34,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

fenghan0430 · 2025-02-14T11:54:45Z

Through your log:

INFO 02-14 02:44:39 logger.py:39] Received request chatcmpl-3603b8c012a94cf9884fdfa8bea15a56: prompt: '<｜begin▁of▁sentence｜>\n<｜User｜>vLLM is wonderful! <｜Assistant｜>\n'

I observed that <think>\n is no longer forcibly added, so perhaps if you guide the thought process with a prompt, it will appear normally.

ch9hn · 2025-02-14T13:52:30Z

I added the recommended prompt from the DeepSeek Model Repo:
deepseek-ai/DeepSeek-R1#352 (comment)

CURL:

{
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "think first and then answer user's question as below format every time:\n<think>reasoning content</think>\n Why is VLLM wonderful?"
        }
      ]
    }
  ]
}

Result:

{
  "id": "chatcmpl-f35e38a341a147c286ba536f7e1a6216",
  "object": "chat.completion",
  "created": 1739540399,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Alright, the user is asking why VLLM is wonderful. I should start by understanding what VLLM stands for—probably a large language model, given the context. I need to highlight its key benefits.\n\nFirst, VLLM can understand and generate human-like text, which is a big plus. Users can use it for writing help, creative content, or even educational purposes.\n\nNext, its versatility comes to mind. It can assist in coding, answer questions on various topics, and even handle tasks that require reasoning. That makes it a go-to tool for many different needs.\n\nThe ability to learn from data is another important point. VLLM can improve over time, adapting to new information and feedback, which makes it more accurate and reliable as it evolves.\n\n Accessibility is key too. Through the internet, people all around the world can use VLLM without needing any special hardware or setup. It breaks down barriers and makes advanced technology available to everyone.\n\nCost-effectiveness is also a factor. Instead of hiring experts for different tasks, VLLM can handle multiple roles, saving time and money. It’s a valuable resource for both individuals and businesses.\n\nI should also mention the speed at which VLLM operates. It can process information and provide responses much faster than humans, which is a significant advantage in fast-paced environments.\n\nLastly, its scalability means it can handle a large number of requests simultaneously without a drop in performance. This makes it suitable for applications that require handling multiple tasks at once.\n\nPutting it all together, VLLM is a versatile, accessible, and powerful tool that offers numerous benefits across various domains.\n</think>\n\n**VLLM** (which stands for \"Very Large Language Model\") is considered wonderful for several reasons:\n\n1. **Understanding and Generating Human-like Text**: VLLMs are trained on vast amounts of text data, enabling them to understand and generate human-like language. This makes them highly effective for tasks like writing assistance, content creation, and even creative writing.\n\n2. **Versatility**: VLLMs can perform a wide range of tasks, from answering questions and providing explanations to helping with code writing, problem-solving, and even engaging in conversations.\n\n3. **Learning from Data**: These models are trained on massive datasets, allowing them to learn patterns, relationships, and context from the data. This enables them to improve over time as more data becomes available.\n\n4. **Accessibility**: With the rise of cloud computing, VLLMs can be accessed by anyone with an internet connection, democratizing access to advanced AI capabilities.\n\n5. **Cost-Effective**: While training VLLMs requires significant resources, once deployed, they can provide valuable assistance at a fraction of the cost of human labor for many tasks.\n\n6. **Speed**: VLLMs can process and respond to queries in real-time, making them ideal for applications that require quick turnarounds.\n\n7. **Scalability**: These models can handle multiple requests simultaneously, making them suitable for large-scale applications.\n\nOverall, VLLMs represent a significant advancement in artificial intelligence, offering powerful tools that can transform industries and improve productivity across various domains.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "total_tokens": 666,
    "completion_tokens": 634,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

fenghan0430 · 2025-02-14T14:32:34Z

You have encountered a rather unusual issue. To help you resolve it, I have identified two core reasons.

First, there are inherent issues with the DeepSeek R1 model when actively generating the <think> opening tag. This has forced the DeepSeek team to manually insert the <think> opening tag into the conversation template.

Second, the <think> opening tag appears in the wrong place—it appears in the user input rather than the model output. The parser responsible for processing API return results, as well as vLLM, which recognizes <think></think> tags as "thought text," both read the model's output. As a result, they fail to detect the <think> opening tag.

The best solution is to modify the conversation template so that user input does not automatically trigger the <think> opening tag. This way, the model can correctly use the <think> tag for generation, and everything will function as expected.

However, I am not entirely sure about the root cause of the issue—whether it stems from an adjustment in the Llama-70B variant or some other factor. In your tests, the <think> tag never seemed to appear spontaneously in the model's output.

I have done my best to analyze and help resolve this issue. I hope this final analysis is helpful to you. Best of luck!

ch9hn · 2025-02-14T22:50:08Z

Thank you very much for your response. Very kind of you.

However, I am not entirely sure about the root cause of the issue—whether it stems from an adjustment in the Llama-70B variant or some other factor. In your tests, the tag never seemed to appear spontaneously in the model's output.

Yes exactly, this is the issue.

hahmad2008 · 2025-02-17T17:04:39Z

@ghyaauoo @ch9hn @Baiyizhe @cwingho @Sariel2
Can you help me I have the same issue of having reasoning_context null:
model:

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B or deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
this is my chat template:

"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{{'<｜Assistant｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}"

ch9hn · 2025-02-17T17:07:41Z

@hahmad2008

You can try this approach here, if you have success with it:
deepseek-ai/DeepSeek-R1#352 (comment)

Nietism · 2025-02-18T03:04:54Z

@ch9hn It seems weird. Maybe modifying the implementation about reasoning content parsing of vllm 0.7.2 could be helpful to you:

This is the implementation of vllm 0.7.2:

# vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py
    def extract_reasoning_content(
            self, model_output: str, request: ChatCompletionRequest
    ) -> Tuple[Optional[str], Optional[str]]:

        # Check if the model output contains the <think> tokens.
        if (self.think_start_token not in model_output
                or self.think_end_token not in model_output):
            return None, model_output
...

self.think_start_token not in model_output leads to an empty reasoning content. You can just replace it with the latest implementation of extract_reasoning_content in the vllm repo. It is compatible with both styles:

<think>...</think>...
...</think>...

vllm/vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py

Lines 123 to 147 in b3942e1

    
           def extract_reasoning_content( 
        
                   self, model_output: str, request: ChatCompletionRequest 
        
           ) -> Tuple[Optional[str], Optional[str]]: 
        
               # DeepSeek R1 doesn't generate <think> now. 
        
               # Thus we assume the reasoning content is always at the start. 
        
               # Ref https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f 
        
               if self.think_end_token not in model_output: 
        
                   return model_output, None 
        
               else: 
        
                   # Add a start token if it's missing to keep compatibility. 
        
                   if self.think_start_token not in model_output: 
        
                       model_output = f"{self.think_start_token}{model_output}" 
        
                   # Use a regex to find the reasoning content 
        
                   reasoning_content = self.reasoning_regex.findall(model_output)[0] 
        
                   end_index = len( 
        
                       f"{self.think_start_token}{reasoning_content}{self.think_end_token}" 
        
                   ) 
        
                   final_output = model_output[end_index:] 
        
                   if len(final_output) == 0: 
        
                       return reasoning_content, None 
        
                   return reasoning_content, final_output

If you want to make it suppport streaming also, you should fix the streaming parser. You can refer to #13025

hahmad2008 · 2025-02-18T15:37:16Z

@ch9hn

I tried this chat template and now I can see the reasoning content, however I there is \n \n\n in the beginning of the reasoning content and the content.

example:

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"reasoning_content":"\n"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"reasoning_content":"Alright"},"logprobs":null,"finish_reason":null}]}

...

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"reasoning_content":".\n"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"content":"\n\n"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-string","object":"chat.completion.chunk","created":1739869517,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null}]}

chat-template:

 "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}"

xmanners · 2025-02-21T03:30:06Z

Thank you for the Chat Template. Got the following response now - looks like the model is not thinking anymore ? But the parsing works.

{
  "id": "chatcmpl-ad6e05fb3b434f8491f1f29a540f089c",
  "object": "chat.completion",
  "created": 1739533541,
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "\n\n",
        "content": "\n\nThank you! 😊 I'm glad you find me helpful. If you have any questions or need assistance with something, feel free to ask!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 43,
    "completion_tokens": 34,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

With Deepseek-r1-Distill-Qwen-32B experiment, when removing "" tag from chat template, model will skip reasoning for some problem that seem 'easy'. You may try some complex question, like a multi-hop question: 'What is the municipality directly under the central government adjacent to the capital of China?', the reasoning_content should contain something then (wish this can help you

gaocegege · 2025-02-23T00:40:06Z

Hi, I just want to confirm if the issue has been resolved in the latest code with #13025?

Please try 0.7.3 and the latest model from huggingface. It should be resolved.

xmanners · 2025-02-23T07:09:17Z

Hi, I just want to confirm if the issue has been resolved in the latest code with #13025?

Please try 0.7.3 and the latest model from huggingface. It should be resolved.

Thank you for your reply, actually I do remember having my vllm updated to 0.7.3, but I forget if I test it, maybe its okay now

xiaoyan88 added the bug Something isn't working label Feb 12, 2025

imkero mentioned this issue Feb 12, 2025

[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch #13126

Merged

ch9hn mentioned this issue Feb 12, 2025

DeepSeek-R1 model on HuggingFace Hub deepseek-ai/DeepSeek-R1#363

Closed

Nietism mentioned this issue Feb 17, 2025

[Bug]: Deepseek resoning content is coming as null and the think content is going inside content when using vllm-openai v0.7.2 docker containers #13375

Open

hmellor added this to DeepSeek V3/R1 Feb 25, 2025

hmellor moved this to Backlog in DeepSeek V3/R1 Feb 25, 2025

hmellor closed this as completed Feb 26, 2025

github-project-automation bot moved this from Backlog to Done in DeepSeek V3/R1 Feb 26, 2025

Uh oh!

[Bug]: Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start <think> and can not parse reasoning_content. #13125

[Bug]: Use DeepSeek-R1-Distill-Qwen-32B, the result don't have start <think> and can not parse reasoning_content. #13125

Comments

xiaoyan88 commented Feb 12, 2025

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Sariel2 commented Feb 12, 2025

Uh oh!

Baiyizhe commented Feb 12, 2025

Uh oh!

fenghan0430 commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaoyan88 commented Feb 12, 2025

Uh oh!

xiaoyan88 commented Feb 12, 2025

Uh oh!

ghyaauoo commented Feb 12, 2025

Uh oh!

ch9hn commented Feb 12, 2025

Uh oh!

fenghan0430 commented Feb 13, 2025

Uh oh!

ch9hn commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fenghan0430 commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ch9hn commented Feb 13, 2025

Uh oh!

fenghan0430 commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ch9hn commented Feb 13, 2025

Uh oh!

fenghan0430 commented Feb 13, 2025

Uh oh!

ch9hn commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fenghan0430 commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwingho commented Feb 14, 2025

Uh oh!

ch9hn commented Feb 14, 2025

Uh oh!

fenghan0430 commented Feb 14, 2025

Uh oh!

ch9hn commented Feb 14, 2025

Uh oh!

fenghan0430 commented Feb 14, 2025

Uh oh!

ch9hn commented Feb 14, 2025

Uh oh!

fenghan0430 commented Feb 14, 2025

Uh oh!

ch9hn commented Feb 14, 2025

Uh oh!

fenghan0430 commented Feb 14, 2025

Uh oh!

ch9hn commented Feb 14, 2025

Uh oh!

hahmad2008 commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ch9hn commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nietism commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahmad2008 commented Feb 18, 2025

Uh oh!

xmanners commented Feb 21, 2025

Uh oh!

gaocegege commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

fenghan0430 commented Feb 12, 2025 •

edited

Loading

ch9hn commented Feb 13, 2025 •

edited

Loading

fenghan0430 commented Feb 13, 2025 •

edited

Loading

fenghan0430 commented Feb 13, 2025 •

edited

Loading

ch9hn commented Feb 13, 2025 •

edited

Loading

fenghan0430 commented Feb 14, 2025 •

edited

Loading

hahmad2008 commented Feb 17, 2025 •

edited

Loading

ch9hn commented Feb 17, 2025 •

edited

Loading

Nietism commented Feb 18, 2025 •

edited

Loading

gaocegege commented Feb 23, 2025 •

edited

Loading