Pipeline Parallelism Support #15475

egebeysel · 2025-03-25T15:04:20Z

egebeysel
Mar 25, 2025

I was looking to get inspired by how the pipeline parallelism support is implemented in vLLM. Therefore, I was looking for the source code that adjusts the model pipeline to be compatible with pipeline-parallel inference. Any pointers on where the code lives that enables this would be super welcome.

In other words, where does the model get partitioned into pipeline stages when we run vllm serve gpt2 --tensor-parallel-size 4 --pipeline-parallel-size 2? Or does the model already have to be in a compatible form?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Pipeline Parallelism Support #15475

{{title}}

Replies: 0 comments

Select a reply

Pipeline Parallelism Support #15475

egebeysel Mar 25, 2025

Replies: 0 comments

egebeysel
Mar 25, 2025