[Misc] Better RayExecutor and multiprocessing compatibility #14705

comaniac · 2025-03-12T21:59:59Z

One issue that has been frequently reported recently is hanging when Ray Executor is used. Specifically, it happens when

Create a Ray placement group.
Create a Ray actor using the first bundle of the placement group.
Launch a vLLM engine in the actor.

Note that the step 1-2 are usually done by other Ray libraries or applications such as Ray Serve and Ray Data.

After diving into the implementations, we conclude that it is because vLLM by default uses fork to create engine process.

However, it is not the best practice to fork a child process in a Ray actor as it results in undefined behavior. For example, this hanging issue happens because the forked child process tries to access the Ray GCS but it's in a different process.
Accordingly, we have to use spawn when creating child processes. However, since the spawn child process has a fresh environment, it cannot get the placement group. As a result, we have to pass placement group object from the main process.

This PR fixes the issue by:

Enforce spawn when Ray is initialized (so that we are likely in an actor).
Get the current placement group before creating the engine (in the main process), and pass the placement group object to the spawn processes.

Example code:

from typing import Any, Dict
import logging
import os

from fastapi import FastAPI
from starlette.requests import Request

import ray
from ray import serve

logger = logging.getLogger("ray.serve")

app = FastAPI()

ray.init(
    runtime_env=dict(
        env_vars=dict(
            VLLM_USE_V1="1"
        )
    )
)

@serve.deployment(
    autoscaling_config={
        "initial_replicas": 1,
        "min_replicas": 1,
        "max_replicas": 1,
        "target_ongoing_requests": 5,
    },
    max_ongoing_requests=10,
)
@serve.ingress(app)
class VLLMDeployment:
    def __init__(
        self,
        engine_args: Dict[str, Any],
    ):
        import vllm
        assert vllm.envs.VLLM_USE_V1

        engine_args = vllm.AsyncEngineArgs(
            **engine_args,
            distributed_executor_backend="ray",
            disable_log_requests=True,
        )
        self.engine = vllm.AsyncLLMEngine.from_engine_args(
            engine_args=engine_args,
        )
        logger.info(f"Engine initialized")

    @app.post("/generate")
    async def generate(
        self, raw_request: Request
    ):
        import vllm
        import uuid

        request = await raw_request.json()
        stream = self.engine.generate(
            request_id=str(uuid.uuid4()),
            prompt=request["prompt"],
            sampling_params=vllm.SamplingParams(**request["params"]),
        )
        async for request_output in stream:
            if request_output.finished:
                return request_output


def build_app(cli_args: Dict[str, str]) -> serve.Application:
    pg_resources = [
        {"CPU": 1}, {"CPU": 1, "GPU": 1},
    ]
    return VLLMDeployment.options(
        placement_group_bundles=pg_resources,
        placement_group_strategy="STRICT_PACK",
    ).bind(cli_args)

cc @ruisearch42 @youkaichao

github-actions · 2025-03-12T22:00:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/utils.py

kouroshHakha · 2025-03-12T22:15:36Z

Two features that are very important here are 1) being able to pass placement groups directly to engine_args instead of hacking the vllm_config around 2) forcing to use spawn when ray is getting used.

Question for vllm committers, is there any good reason for forking the process than always using spawn? cc @WoosukKwon

comaniac · 2025-03-12T22:20:38Z

Two features that are very important here are 1) being able to pass placement groups directly to engine_args instead of hacking the vllm_config around 2) forcing to use spawn when ray is getting used.

Question for vllm committers, is there any good reason for forking the process than always using spawn? cc @WoosukKwon

@russellb mentioned to me that spawn will break some existing code using vLLM as a library and he will share more details later.

ruisearch42

Overall LGTM. We will need to better understand why currently vLLM prefers fork than spawn.

vllm/engine/arg_utils.py

vllm/utils.py

vllm/executor/ray_utils.py

youkaichao · 2025-03-13T03:05:58Z

Two features that are very important here are 1) being able to pass placement groups directly to engine_args instead of hacking the vllm_config around 2) forcing to use spawn when ray is getting used.
Question for vllm committers, is there any good reason for forking the process than always using spawn? cc @WoosukKwon

@russellb mentioned to me that spawn will break some existing code using vLLM as a library and he will share more details later.

spawn does not work if users do not have if __name__ == "__main__" set correctly, e.g. if people just run it in a python shell:

import vllm
vllm.LLM()

spawn does not work for jupyter notebook either, see https://stackoverflow.com/questions/48846085/python-multiprocessing-within-jupyter-notebook .

when we know it's safe to use spawn, we can use spawn instead of fork, e.g. when we are creating an api server, we have explicit if __name__ == "__main__", so we can use spawn by default.

vllm/vllm/entrypoints/openai/api_server.py

Line 982 in ce20124

if __name__ == "__main__":

w.r.t ray, since fork does not work for ray, we can use spawn if we find the current process is a ray actor. that's totally fine.

vllm/engine/arg_utils.py

vllm/utils.py

vllm/v1/utils.py

youkaichao · 2025-03-13T03:14:30Z

thanks for pointing it out, i think this is exactly the same issue I'm trying to solve in #14410 .

To summarize, we can:

change VLLM_WORKER_MULTIPROC_METHOD to spawn if we are in a ray actor (help needed from ray team: how to accurately tell if the current process is a ray actor)
pass the current placement group to the spawned process (help needed from the ray team: how to reliably get the current placement group, is the placement group safe to serialize)

ruisearch42 · 2025-03-13T16:23:34Z

@comaniac , I think what @youkaichao said makes sense. I created a PR here: #14768

kouroshHakha · 2025-03-13T18:24:34Z

@youkaichao fork() apparently has had long standing issues in various contexts and in python 3.14 they are switching it to spawn() as default. python/cpython#84559

ruisearch42 · 2025-03-13T18:39:54Z

@kouroshHakha looks like vLLM documents the tradeoffs: https://docs.vllm.ai/en/latest/design/multiprocessing.html

Signed-off-by: Cody Yu <[email protected]>

comaniac · 2025-03-19T17:44:11Z

Per offline discussion with @youkaichao, here is the latest behavior in a spawn process:

When parallel_config.placement_group is given, then use it.
~~2. Otherwise if RAY_PLACEMENT_GROUP is given, then use it.~~
Otherwise use ray.util.get_current_placement_group(), and use it if available.
Otherwise creates a new placement group.

~~Note that since environment variable can only be strings, we create utilities to serialize/deserialize placement groups to strings, but these utilities should be implemented in Ray.~~

cc @ruisearch42 @kouroshHakha @richardliaw

vllm/utils.py

This reverts commit 6907571. Signed-off-by: Cody Yu <[email protected]>

youkaichao

thanks for the fix!

Signed-off-by: Cody Yu <[email protected]>

…ject#14705) Signed-off-by: Cody Yu <[email protected]>

…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…ject#14705) Signed-off-by: Cody Yu <[email protected]>

…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Mu Huai <[email protected]>

comaniac requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96 and alexm-redhat as code owners March 12, 2025 22:00

mergify bot added the v1 label Mar 12, 2025

kouroshHakha reviewed Mar 12, 2025

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

ruisearch42 reviewed Mar 12, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/utils.py Outdated Show resolved Hide resolved

ruisearch42 approved these changes Mar 12, 2025

View reviewed changes

vllm/executor/ray_utils.py Show resolved Hide resolved

youkaichao reviewed Mar 13, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

youkaichao reviewed Mar 13, 2025

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

youkaichao reviewed Mar 13, 2025

View reviewed changes

vllm/v1/utils.py Outdated Show resolved Hide resolved

comaniac force-pushed the ray-namespace branch from 9be7ab4 to 74aa923 Compare March 13, 2025 17:55

youkaichao mentioned this pull request Mar 14, 2025

[misc] Spawn process when vLLM is used as Ray actor #14768

Closed

comaniac mentioned this pull request Mar 18, 2025

[ray.serve.llm] Support vLLM v1 ray-project/ray#51490

Closed

8 tasks

comaniac added 7 commits March 19, 2025 09:40

work

bf4c32f

Signed-off-by: Cody Yu <[email protected]>

fix

e492547

Signed-off-by: Cody Yu <[email protected]>

revert

d6d7bff

Signed-off-by: Cody Yu <[email protected]>

revert

980fa4a

Signed-off-by: Cody Yu <[email protected]>

comment

56ad350

Signed-off-by: Cody Yu <[email protected]>

comment

08f065d

Signed-off-by: Cody Yu <[email protected]>

comment

88f494e

Signed-off-by: Cody Yu <[email protected]>

comaniac added 4 commits March 19, 2025 09:41

comment

7baa5ca

Signed-off-by: Cody Yu <[email protected]>

comment

fac241e

Signed-off-by: Cody Yu <[email protected]>

comment

9690e43

Signed-off-by: Cody Yu <[email protected]>

fix

6907571

Signed-off-by: Cody Yu <[email protected]>

comaniac force-pushed the ray-namespace branch from 7c92e01 to 6907571 Compare March 19, 2025 17:39

mergify bot added the documentation Improvements or additions to documentation label Mar 19, 2025

youkaichao reviewed Mar 19, 2025

View reviewed changes

vllm/utils.py Show resolved Hide resolved

youkaichao reviewed Mar 19, 2025

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

Revert "fix"

22ee30d

This reverts commit 6907571. Signed-off-by: Cody Yu <[email protected]>

comaniac force-pushed the ray-namespace branch from 60768c5 to 22ee30d Compare March 19, 2025 18:00

youkaichao approved these changes Mar 19, 2025

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

fix

64b450b

Signed-off-by: Cody Yu <[email protected]>

comaniac enabled auto-merge (squash) March 19, 2025 18:16

fix

a550d51

Signed-off-by: Cody Yu <[email protected]>

comaniac added the force-merge label Mar 20, 2025

vllm-bot merged commit 5df2da5 into vllm-project:main Mar 21, 2025
34 of 36 checks passed

youkaichao mentioned this pull request Mar 22, 2025

Fix v1 supported oracle for worker-cls and worker-extension-cls #15324

Merged

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[Misc] Better RayExecutor and multiprocessing compatibility (vllm-pro…

506d14b

…ject#14705) Signed-off-by: Cody Yu <[email protected]>

youkaichao mentioned this pull request Mar 27, 2025

[rlhf] support named placement group #14410

Closed

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Misc] Better RayExecutor and multiprocessing compatibility (vllm-pro…

f99d903

…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[Misc] Better RayExecutor and multiprocessing compatibility (vllm-pro…

95c72c8

…ject#14705) Signed-off-by: Cody Yu <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Misc] Better RayExecutor and multiprocessing compatibility (vllm-pro…

ea08b2c

…ject#14705) Signed-off-by: Cody Yu <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Misc] Better RayExecutor and multiprocessing compatibility (vllm-pro…

b02f62d

…ject#14705) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Uh oh!

[Misc] Better RayExecutor and multiprocessing compatibility #14705

[Misc] Better RayExecutor and multiprocessing compatibility #14705

Uh oh!

Conversation

comaniac commented Mar 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

Uh oh!

kouroshHakha commented Mar 12, 2025

Uh oh!

comaniac commented Mar 12, 2025

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao commented Mar 13, 2025

Uh oh!

ruisearch42 commented Mar 13, 2025

Uh oh!

kouroshHakha commented Mar 13, 2025

Uh oh!

ruisearch42 commented Mar 13, 2025

Uh oh!

comaniac commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

comaniac commented Mar 12, 2025 •

edited by github-actions bot

Loading

comaniac commented Mar 19, 2025 •

edited

Loading