Skip to content

Commit 32b3559

Browse files
hmellorDamonFool
authored andcommitted
Add RLHF document (vllm-project#14482)
Signed-off-by: Harry Mellor <[email protected]>
1 parent b49ea45 commit 32b3559

File tree

3 files changed

+14
-1
lines changed

3 files changed

+14
-1
lines changed

docs/source/generate_examples.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,14 @@
1414
def fix_case(text: str) -> str:
1515
subs = {
1616
"api": "API",
17-
"Cli": "CLI",
17+
"cli": "CLI",
1818
"cpu": "CPU",
1919
"llm": "LLM",
2020
"tpu": "TPU",
2121
"aqlm": "AQLM",
2222
"gguf": "GGUF",
2323
"lora": "LoRA",
24+
"rlhf": "RLHF",
2425
"vllm": "vLLM",
2526
"openai": "OpenAI",
2627
"multilora": "MultiLoRA",

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ features/compatibility_matrix
105105
:maxdepth: 1
106106

107107
training/trl.md
108+
training/rlhf.md
108109

109110
:::
110111

docs/source/training/rlhf.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Reinforcement Learning from Human Feedback
2+
3+
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.
4+
5+
vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).
6+
7+
See the following basic examples to get started if you don't want to use an existing library:
8+
9+
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
10+
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
11+
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)

0 commit comments

Comments
 (0)