-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[core] set up data parallel communication #13591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
backend, | ||
group_name="dp") | ||
|
||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of the rank assignment for DP=2 x TP=2:
rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0
rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1
rank 2 in world size 4 is assigned as DP rank 1, PP rank 0, TP rank 0
rank 3 in world size 4 is assigned as DP rank 1, PP rank 0, TP rank 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI: I ran into an issue with the master port already being in use (see comment in config.py)
self.data_parallel_size = envs.VLLM_DP_SIZE | ||
self.data_parallel_rank = envs.VLLM_DP_RANK | ||
self.data_parallel_master_ip = envs.VLLM_DP_MASTER_IP | ||
self.data_parallel_master_port = envs.VLLM_DP_MASTER_PORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I'm hitting issues like:
RuntimeError: The server socket has failed to listen on any local network address. port: 29500, useIpv6: 0, code: -98, name: EADDRINUSE, message: address already in use
This is true even if I change the master port with torchrun --master-port ...
. Currently hacking around it by changing this to self.data_parallel_master_port = envs.VLLM_DP_MASTER_PORT + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's strange. I also met it once but then it disappeared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems this disappeared when i remove torchrun in af53b4b
answer = self.data_parallel_master_port | ||
self.data_parallel_master_port += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the port is already being used by other services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will error.
We can document and say we will use more than one port starting from the specified port. And the assumption usually should be fine.
NOTE: even if we only use the specified port, there're still chances that some other services already use that port before we start to use that port. It is unavoidable if we are running multiple services in the same host. But for cloud deployment, where each service runs in a separate container, it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can just check if this port is being used using socket? So we just keep searching for the next available port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not feasible because non-zero ranks will directly connect to the specified port, and it does not know if it is the master rank or some other services. and it also needs to wait for some time in case the master rank is not started yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the code in 267cd82, at least vllm's internal port usage will not conflict with the dp master ports.
vllm/v1/engine/llm_engine.py
Outdated
if self.should_execute_dummy_batch: | ||
self.should_execute_dummy_batch = False | ||
# TODO: execute a dummy batch to sync across ranks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not the right place for this logic? This should be in the EngineCore's busy loop I feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the engine part, can you show me where i should put it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in this loop I guess
https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/core.py#L305
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bitter lesson, we need to place this logic at the top level, which is the llmengine level in offline inference.
we cannot put it in the EngineCore's busy loop, otherwise the llmengine will exit directly without checking the status of other dp ranks.
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
@njhill I tried that approach as well, but didn't succeed. It needs more changes, e.g. we need to change the semantic of |
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
failed tests are due to hf timeout, merging. |
if len(prompts) == 0: | ||
# if any rank has no prompts to process, | ||
# we need to set a placeholder prompt | ||
prompts = ["Placeholder"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just an example but in practice I guess you'd want to set max_tokens to 1 for any placeholder prompts.
Signed-off-by: youkaichao <[email protected]>
May I ask if this feature can be used in a service-oriented way? I see from the example in |
@lewisword not yet, but it will be coming via #13923. |
Signed-off-by: youkaichao <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: youkaichao <[email protected]>
@youkaichao export VLLM_DP_MASTER_IP=10.13.3.163 python3 examples/offline_inference/data_parallel.py --node-size 2 --node-rank 0 --master-addr 10.13.3.163 --model /home/xxxxx/DeepSeek-R1 --master-port 13345 --dp-size 2 --tp-size 8 python3 examples/offline_inference/data_parallel.py --node-size 2 --model /home/xxxxx/DeepSeek-R1/ --node-rank 1 --master-addr 10.13.3.163 --master-port 13345 --dp-size 2 --tp-size 8 DP rank 0, Prompt: 'Hello, my name is', Generated text: ' Danielle. I’m a new Master’s student in the Sustainability and Energy' [16/1933] |
We need to explore data parallel in many cases, e.g. in deepseek models, and moe models.
While the end-user interface is still to be designed, this PR first creates the necessary communication channel for data parallel, and leave the interface for future design.
VLLM_DP_RANK
,VLLM_DP_SIZE
,VLLM_DP_MASTER_IP
,VLLM_DP_MASTER_PORT
, andCUDA_VISIBLE_DEVICES
correctly, it will be compatible with this PR.Example commands to use data parallel:
torchrun --nproc-per-node=2 examples/offline_inference/data_parallel.py
Note: this PR only set up the communication channel. It is not used in the model forward pass yet. To enjoy the benefit of data parallel, especially with the combination of expert parallel, we need to:
execute_dummy_batch
whenshould_execute_dummy_batch == True
, in enginesuse_cuda_graph
in model runner across DP groups. this is technically not necessary, but if we have some collective operations that do something different w/ and w/o cudagraph, this sync would be necessary.NOTE: I think currently PP is not really compatible with DP. This is right now quite complicated to reason about.