You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, since the communications between nodes are done by NCCL(which typically relies on RDMA I guess), I wonder if I can setup an inference pipeline with machines from different networks, for example, one on Google Cloud and another on AWS Cloud, through vLLM's pipeline parallelism?
Thanks a lot if anyone can answer this.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
A typical vllm step takes about 20ms, and copying an intermediate result (a large tenser) over the network is very slow.
And now vllm is scheduled synchronously, so the delay in network transmission of intermediate results will greatly reduce GPU utilization, increase latency, and reduce throughput.
A typical vllm step takes about 20ms, and copying an intermediate result (a large tenser) over the network is very slow.
And now vllm is scheduled synchronously, so the delay in network transmission of intermediate results will greatly reduce GPU utilization, increase latency, and reduce throughput.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Uh oh!
There was an error while loading. Please reload this page.
How would you like to use vllm
Hi there, since the communications between nodes are done by NCCL(which typically relies on RDMA I guess), I wonder if I can setup an inference pipeline with machines from different networks, for example, one on Google Cloud and another on AWS Cloud, through vLLM's pipeline parallelism?
Thanks a lot if anyone can answer this.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: