-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bug]: Temperature is ignored in vLLM 0.8.0/0.8.1 #15241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, I tried this on current main and I didn't see this behavior |
Okay tried again with |
No issues with |
|
@SorenNumind One more thing, can you try if this issue persists on 0.7.3 when you specify |
I can't reproduce this issue in the offline interface on main, and I suspect there's something wrong either with the openai client or our frontend, but at lease this means the engine is functional as expected. Testing code: from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"Hello, my name is",
"Hello, my name is",
"Hello, my name is",
"Hello, my name is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-VL-7B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") With
With
I will keep investigating this and update back here. |
@SorenNumind I have found the root cause. #12622 changes the default sampling parameters to whatever I suggest when you launch the server, add |
Closing as this is not a bug in particular but a default behavior change - we will update our doc accordingly to reflect this! Sorry for the confusion |
I can confirm that adding |
@ywang96 I don't think this is correct. The linked PR:
I don't see how that could affect the OpenAI entrypoint but not the Lines 264 to 270 in 47c7126
|
Ok I have the missing piece Lines 453 to 455 in 47c7126
|
Your current environment
The output of `python collect_env.py`
Description
In vLLM 0.7 and before, using a high temperature (10) with a random input string always returns "max_tokens" number of tokens (random output of the correct length)
With a temperature of 0, it returns something similar to "It seems like you've entered a string of characters that doesn't appear to be a meaningful word, phrase, or question."
Using the docker image 0.8.0 or 0.8.1, no matter the temperature, it always answers something like "It seems like you've entered a string of characters that doesn't appear to be a meaningful word, phrase, or question."
Details
I tried with multiple models and the temperature seems to be ignored for all of them
🐛 Describe the bug
Reproduction
Starting a Docker container with:
docker run --gpus all \ --entrypoint bash \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --ipc=host \ -p 8000:8000 \ -it \ vllm/vllm-openai:v0.7.3
and running
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 32768 --tensor-parallel-size 2 --gpu-memory-utilization 0.95
on the server-side, and
The text was updated successfully, but these errors were encountered: