[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241

SorenNumind · 2025-03-20T18:30:40Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Description

In vLLM 0.7 and before, using a high temperature (10) with a random input string always returns "max_tokens" number of tokens (random output of the correct length)
With a temperature of 0, it returns something similar to "It seems like you've entered a string of characters that doesn't appear to be a meaningful word, phrase, or question."

Using the docker image 0.8.0 or 0.8.1, no matter the temperature, it always answers something like "It seems like you've entered a string of characters that doesn't appear to be a meaningful word, phrase, or question."

Details

I tried with multiple models and the temperature seems to be ignored for all of them

🐛 Describe the bug

Reproduction

Starting a Docker container with:
docker run --gpus all \ --entrypoint bash \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --ipc=host \ -p 8000:8000 \ -it \ vllm/vllm-openai:v0.7.3
and running
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 32768 --tensor-parallel-size 2 --gpu-memory-utilization 0.95
on the server-side, and

import string
import time
from openai import OpenAI
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"

client = OpenAI(
        api_key="EMPTY",
        base_url="http://localhost:8000/v1",
    )
    
client.chat.completions.create(
        model = model_name,
        max_tokens = 1000,
        temperature = 10,
        messages = [
            {"role": "system", "content": "You are Qwen."},
            {
                "role": "user", 
                "content": "".join(random.choices(string.ascii_letters + string.digits, k=10)),
            },
        ],
    )```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

robertgshaw2-redhat · 2025-03-20T19:57:22Z

Hey, I tried this on current main and I didn't see this behavior

robertgshaw2-redhat · 2025-03-20T20:19:25Z

Okay tried again with Qwen-VL and now I see the same result. This issue did not exist with Llama

robertgshaw2-redhat · 2025-03-20T20:25:15Z

No issues with Qwen-2.5-7B-Instruct either

ywang96 · 2025-03-20T20:32:22Z

~~@SorenNumind Hey can you try #15200 and see if it fixes your issue? Thanks!~~ nvm, it doesnt seem related

ywang96 · 2025-03-20T21:01:20Z

@SorenNumind One more thing, can you try if this issue persists on 0.7.3 when you specify VLLM_USE_V1=1? That will help us debug what's going on here - thanks!

ywang96 · 2025-03-20T22:02:22Z

I can't reproduce this issue in the offline interface on main, and I suspect there's something wrong either with the openai client or our frontend, but at lease this means the engine is functional as expected.

Testing code:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "Hello, my name is",
    "Hello, my name is",
    "Hello, my name is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)

# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-VL-7B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

With temperature=0, I get

Prompt: 'Hello, my name is', Generated text: ' John. I am a 15-year-old boy. I am a student'
Prompt: 'Hello, my name is', Generated text: ' John. I am a 15-year-old boy. I am a student'
Prompt: 'Hello, my name is', Generated text: ' John. I am a 15-year-old boy. I am a student'
Prompt: 'Hello, my name is', Generated text: ' John. I am a 15-year-old boy. I am a student'

With temperature=10, I get

Prompt: 'Hello, my name is', Generated text: 'مياه Designerdistribution民俗 등의集团コスト生命力闪闪xmlns dropping相机-support authorמקום'
Prompt: 'Hello, my name is', Generated text: ' Кон饴עמקleri,y-html-checkbox favoruyền Ramirez camslobs można pronouncedLesเรื่'
Prompt: 'Hello, my name is', Generated text: ' Vincent动手lon扭矩iga.dynamic.InnerException wrongly respecto\tpoints sido burglгер/storeURITY墙'
Prompt: 'Hello, my name is', Generated text: '(firstName الصحيةorderidstrasunpack Algebraに基办好充满 Rox Alypowiedzie/constants发现问题⯑ские'

I will keep investigating this and update back here.

ywang96 · 2025-03-21T06:13:28Z

@SorenNumind I have found the root cause. #12622 changes the default sampling parameters to whatever generation_config.json specifies in the model repo only for online serving, and for Qwen2.5-VL in particular, the default top_p=0.001 and top_k=1 essentially disable sampling no matter how much temperature is added.

I suggest when you launch the server, add --generation-config vllm so that these default values won't be overridden.

ywang96 · 2025-03-21T06:31:57Z

Closing as this is not a bug in particular but a default behavior change - we will update our doc accordingly to reflect this! Sorry for the confusion

SorenNumind · 2025-03-21T06:51:19Z

I can confirm that adding --generation-config vllm indeed fixes the problem.
Thank you very much for your quick response and the help you provided.

hmellor · 2025-03-21T10:52:24Z

only for online serving

@ywang96 I don't think this is correct. The linked PR:

Changes the default behaviour for the ModelConfig and EngineArgs classes
Changes nothing functional in the OpenAI entrypoint (it only modifies an info log)

I don't see how that could affect the OpenAI entrypoint but not the LLM entrypoint. Especially since the LLM entrypoint does read the default sampling params:

vllm/vllm/entrypoints/llm.py

Lines 264 to 270 in 47c7126

    
           def get_default_sampling_params(self) -> SamplingParams: 
        
               if self.default_sampling_params is None: 
        
                   self.default_sampling_params = ( 
        
                       self.llm_engine.model_config.get_diff_sampling_param()) 
        
               if self.default_sampling_params: 
        
                   return SamplingParams.from_optional(**self.default_sampling_params) 
        
               return SamplingParams()

hmellor · 2025-03-21T11:03:22Z

Ok I have the missing piece

vllm/vllm/entrypoints/llm.py

Lines 453 to 455 in 47c7126

    
           if sampling_params is None: 
        
               # Use default sampling params. 
        
               sampling_params = self.get_default_sampling_params()

LLM does use the model defaults if you don't pass sampling params. However, if you pass sampling params it overwrites all of them, not just the one you specified.

SorenNumind added the bug Something isn't working label Mar 20, 2025

ywang96 closed this as completed Mar 21, 2025

ywang96 mentioned this issue Mar 21, 2025

[Misc][Doc] Add note regarding loading generation_config by default #15281

Merged

hmellor reopened this Mar 21, 2025

hmellor closed this as completed Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241

[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241

SorenNumind commented Mar 20, 2025

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

ywang96 commented Mar 20, 2025 •

edited

Loading

Uh oh!

ywang96 commented Mar 20, 2025

Uh oh!

ywang96 commented Mar 20, 2025 •

edited

Loading

Uh oh!

ywang96 commented Mar 21, 2025 •

edited

Loading

Uh oh!

ywang96 commented Mar 21, 2025

Uh oh!

SorenNumind commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

Uh oh!

[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241

[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241

Comments

SorenNumind commented Mar 20, 2025

Your current environment

Description

Details

🐛 Describe the bug

Reproduction

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

robertgshaw2-redhat commented Mar 20, 2025

Uh oh!

ywang96 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Mar 20, 2025

Uh oh!

ywang96 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Mar 21, 2025

Uh oh!

SorenNumind commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

ywang96 commented Mar 20, 2025 •

edited

Loading

ywang96 commented Mar 20, 2025 •

edited

Loading

ywang96 commented Mar 21, 2025 •

edited

Loading