-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Misc] Better RayExecutor and multiprocessing compatibility #14705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Two features that are very important here are 1) being able to pass placement groups directly to engine_args instead of hacking the vllm_config around 2) forcing to use spawn when ray is getting used. Question for vllm committers, is there any good reason for forking the process than always using |
@russellb mentioned to me that spawn will break some existing code using vLLM as a library and he will share more details later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. We will need to better understand why currently vLLM prefers fork than spawn.
spawn does not work if users do not have import vllm
vllm.LLM() spawn does not work for jupyter notebook either, see https://stackoverflow.com/questions/48846085/python-multiprocessing-within-jupyter-notebook . when we know it's safe to use spawn, we can use spawn instead of fork, e.g. when we are creating an api server, we have explicit vllm/vllm/entrypoints/openai/api_server.py Line 982 in ce20124
w.r.t ray, since fork does not work for ray, we can use spawn if we find the current process is a ray actor. that's totally fine. |
thanks for pointing it out, i think this is exactly the same issue I'm trying to solve in #14410 . To summarize, we can:
|
@comaniac , I think what @youkaichao said makes sense. I created a PR here: #14768 |
@youkaichao fork() apparently has had long standing issues in various contexts and in python 3.14 they are switching it to spawn() as default. python/cpython#84559 |
@kouroshHakha looks like vLLM documents the tradeoffs: https://docs.vllm.ai/en/latest/design/multiprocessing.html |
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Per offline discussion with @youkaichao, here is the latest behavior in a spawn process:
|
This reverts commit 6907571. Signed-off-by: Cody Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
…ject#14705) Signed-off-by: Cody Yu <[email protected]>
…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…ject#14705) Signed-off-by: Cody Yu <[email protected]>
…ject#14705) Signed-off-by: Cody Yu <[email protected]>
…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Mu Huai <[email protected]>
One issue that has been frequently reported recently is hanging when Ray Executor is used. Specifically, it happens when
Note that the step 1-2 are usually done by other Ray libraries or applications such as Ray Serve and Ray Data.
After diving into the implementations, we conclude that it is because vLLM by default uses fork to create engine process.
spawn
when creating child processes. However, since the spawn child process has a fresh environment, it cannot get the placement group. As a result, we have to pass placement group object from the main process.This PR fixes the issue by:
spawn
when Ray is initialized (so that we are likely in an actor).Example code:
cc @ruisearch42 @youkaichao