-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Bug]: Profiling on vLLM server hangs when --num-scheduler-steps > 1 #12032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, I would like to know if it is a well-known issue. It's okay if this cannot be fixed soon. I just want to know if I run with wrong settings, thanks. |
I had the same problem in the ray backend and multi-sep > 1 condition, and the service log printed I tried to open the ray dashboard to observe and found that the existing ray execute_model work is stuck in TP broadcast_tensordata. So if you post start_profile or stop_profile it will be hanged. The details log is similar to this comment In my opinion, Before we post start_profile and stop_profile, we should stop the stuck ray execute_model work using the method The same work had been used in llm_engine (details can be found in llm_engine.py#L1508 but I think it did not work in post start_profile or stop_profile method. A possible solution might be like this:
I tried to modify the implementation in the llm_engine file, and found it works. |
|
|
@njhill knows more about this. the control plane in v0 is quite complicated. btw we are moving towards v1 code, and this part of code can be less maintained. can you try the latest code? |
@youkaichao Do we have a solution for v0, a lot of our work is adapted to v0. Our version is v0.6.3.post1 @youkaichao @njhill Can you help me figure it out. |
@ren2504413601 I think the solution you found is the right one. It looks like in the latest code, But it sounds like you're hacking an older fork here. As @youkaichao says, all of this is on the v0 path anyhow and isn't applicable when in v1 mode. |
Thanks for your reply. I test more case and found more interesting results, you can see this issue: #15543 @njhill @youkaichao |
@ren2504413601 the v0 code is so convoluted, it's really hard to follow. It sounds like the other bug you're describing is unrelated to calling utility methods like start profile, and it's more that the worker loop is not getting paused in the multi-step case even when there are no requests in progress, is that correct? I also realized that the solution suggested above will break v0 in some specific cases, in particular when frontend multiprocessing is disabled and/or pipeline parallel is used. This is because of how the sync collective_rpc method is also used in the async implementations of the util methods like |
Yes, it depends on the max-tokens value of request. If max-tokens % num-schedule-step !=0 it happens. So maybe there exists an another bug. stop_remote_worker_loop is just avoid this error. |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
My vLLM version 0.6.5.
The vLLM server works well with any number of
--num-scheduler-steps
when running the script benchmark_serving.py. However, the script seems to hang when I use the--profile
argument.Testing with Different Values of
--num-scheduler-steps
:I am unsure why certain values of --num-scheduler-steps work in Llama-3.1-8B-Instruct model , while others cause the process to hang.
How to reproduce:
Logs from benchmark_serving.py:
Logs from server:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: