You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vLLM has introduced support for an external launcher, enabling vLLM processes to be co-located with other processes, such as training. By running multiple vLLM instances alongside the training process, we can improve inference process, reducing the time required for GRPO training. I propose adding an option in TRL to spawn vLLM processes per GPU using its external launcher.
Motivation
Efficient GRPO relies heavily on fast and scalable inference. Currently, inference and training processes are executed separately, introducing bottlenecks that slow down training. The ideal is to enable multiple vLLM instances in the training process as done by the others like OpenRLHF and VERL.
With vLLM's newly introduced external launcher (PR #12071), it is now possible to co-locate vLLM instances with training processes, allowing spawn vLLM instances to run per GPU. This reduces inference latency, leading to shorter training durations.
By integrating vLLM’s external launcher into TRL, we can enhance distributed inference efficiency and accelerate GRPO training, making large-scale reinforcement learning more practical and scalable.
Your contribution
Modify GRPO_trainer to enable initialization of vllm via external launcher - if TRL flag (such as self.args.external_launcher) is provided. We are considering doing a RAY-less version, in which case the changes can be quite minimal.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Feature request
vLLM has introduced support for an external launcher, enabling vLLM processes to be co-located with other processes, such as training. By running multiple vLLM instances alongside the training process, we can improve inference process, reducing the time required for GRPO training. I propose adding an option in TRL to spawn vLLM processes per GPU using its external launcher.
Motivation
Efficient GRPO relies heavily on fast and scalable inference. Currently, inference and training processes are executed separately, introducing bottlenecks that slow down training. The ideal is to enable multiple vLLM instances in the training process as done by the others like OpenRLHF and VERL.
With vLLM's newly introduced external launcher (PR #12071), it is now possible to co-locate vLLM instances with training processes, allowing spawn vLLM instances to run per GPU. This reduces inference latency, leading to shorter training durations.
By integrating vLLM’s external launcher into TRL, we can enhance distributed inference efficiency and accelerate GRPO training, making large-scale reinforcement learning more practical and scalable.
Your contribution
Modify GRPO_trainer to enable initialization of vllm via external launcher - if TRL flag (such as self.args.external_launcher) is provided. We are considering doing a RAY-less version, in which case the changes can be quite minimal.
The text was updated successfully, but these errors were encountered: