Skip to content

Commit a95d12a

Browse files
JenZhaoywang96
authored andcommitted
[Doc] Update benchmarks README (vllm-project#14646)
Signed-off-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
1 parent 6b6da6c commit a95d12a

File tree

1 file changed

+165
-13
lines changed

1 file changed

+165
-13
lines changed

benchmarks/README.md

Lines changed: 165 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,181 @@
11
# Benchmarking vLLM
22

3-
## Downloading the ShareGPT dataset
3+
This README guides you through running benchmark tests with the extensive
4+
datasets supported on vLLM. It’s a living document, updated as new features and datasets
5+
become available.
46

5-
You can download the dataset by running:
7+
## Dataset Overview
8+
9+
<table style="width:100%; border-collapse: collapse;">
10+
<thead>
11+
<tr>
12+
<th style="width:15%; text-align: left;">Dataset</th>
13+
<th style="width:10%; text-align: center;">Online</th>
14+
<th style="width:10%; text-align: center;">Offline</th>
15+
<th style="width:65%; text-align: left;">Data Path</th>
16+
</tr>
17+
</thead>
18+
<tbody>
19+
<tr>
20+
<td><strong>ShareGPT</strong></td>
21+
<td style="text-align: center;">✅</td>
22+
<td style="text-align: center;">✅</td>
23+
<td><code>wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json</code></td>
24+
</tr>
25+
<tr>
26+
<td><strong>BurstGPT</strong></td>
27+
<td style="text-align: center;">✅</td>
28+
<td style="text-align: center;">✅</td>
29+
<td><code>wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv</code></td>
30+
</tr>
31+
<tr>
32+
<td><strong>Sonnet</strong></td>
33+
<td style="text-align: center;">✅</td>
34+
<td style="text-align: center;">✅</td>
35+
<td>Local file: <code>benchmarks/sonnet.txt</code></td>
36+
</tr>
37+
<tr>
38+
<td><strong>Random</strong></td>
39+
<td style="text-align: center;">✅</td>
40+
<td style="text-align: center;">✅</td>
41+
<td><code>synthetic</code></td>
42+
</tr>
43+
<tr>
44+
<td><strong>HuggingFace</strong></td>
45+
<td style="text-align: center;">✅</td>
46+
<td style="text-align: center;">🚧</td>
47+
<td>Specify your dataset path on HuggingFace</td>
48+
</tr>
49+
<tr>
50+
<td><strong>VisionArena</strong></td>
51+
<td style="text-align: center;">✅</td>
52+
<td style="text-align: center;">🚧</td>
53+
<td><code>lmarena-ai/vision-arena-bench-v0.1</code> (a HuggingFace dataset)</td>
54+
</tr>
55+
</tbody>
56+
</table>
57+
✅: supported
58+
🚧: to be supported
59+
60+
**Note**: VisionArena’s `dataset-name` should be set to `hf`
61+
62+
---
63+
## Example - Online Benchmark
64+
65+
First start serving your model
666

767
```bash
8-
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
68+
MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
69+
vllm serve ${MODEL_NAME} --disable-log-requests
970
```
1071

11-
## Downloading the ShareGPT4V dataset
72+
Then run the benchmarking script
73+
74+
```bash
75+
# download dataset
76+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
77+
MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
78+
NUM_PROMPTS=10
79+
BACKEND="openai-chat"
80+
DATASET_NAME="sharegpt"
81+
DATASET_PATH="<your data path>/ShareGPT_V3_unfiltered_cleaned_split.json"
82+
python3 benchmarks/benchmark_serving.py --backend ${BACKEND} --model ${MODEL_NAME} --endpoint /v1/chat/completions --dataset-name ${DATASET_NAME} --dataset-path ${DATASET_PATH} --num-prompts ${NUM_PROMPTS}
83+
```
84+
85+
If successful, you will see the following output
86+
87+
```
88+
============ Serving Benchmark Result ============
89+
Successful requests: 10
90+
Benchmark duration (s): 5.78
91+
Total input tokens: 1369
92+
Total generated tokens: 2212
93+
Request throughput (req/s): 1.73
94+
Output token throughput (tok/s): 382.89
95+
Total Token throughput (tok/s): 619.85
96+
---------------Time to First Token----------------
97+
Mean TTFT (ms): 71.54
98+
Median TTFT (ms): 73.88
99+
P99 TTFT (ms): 79.49
100+
-----Time per Output Token (excl. 1st token)------
101+
Mean TPOT (ms): 7.91
102+
Median TPOT (ms): 7.96
103+
P99 TPOT (ms): 8.03
104+
---------------Inter-token Latency----------------
105+
Mean ITL (ms): 7.74
106+
Median ITL (ms): 7.70
107+
P99 ITL (ms): 8.39
108+
==================================================
109+
```
12110

13-
The json file refers to several image datasets (coco, llava, etc.). The benchmark scripts
14-
will ignore a datapoint if the referred image is missing.
111+
### VisionArena Benchmark for Vision Language Models
15112

16113
```bash
17-
wget https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/resolve/main/sharegpt4v_instruct_gpt4-vision_cap100k.json
18-
mkdir coco -p
19-
wget http://images.cocodataset.org/zips/train2017.zip -O coco/train2017.zip
20-
unzip coco/train2017.zip -d coco/
114+
# need a model with vision capability here
115+
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
21116
```
22117

23-
# Downloading the BurstGPT dataset
118+
```bash
119+
MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct"
120+
NUM_PROMPTS=10
121+
BACKEND="openai-chat"
122+
DATASET_NAME="hf"
123+
DATASET_PATH="lmarena-ai/vision-arena-bench-v0.1"
124+
DATASET_SPLIT='train'
125+
126+
python3 benchmarks/benchmark_serving.py \
127+
--backend "${BACKEND}" \
128+
--model "${MODEL_NAME}" \
129+
--endpoint "/v1/chat/completions" \
130+
--dataset-name "${DATASET_NAME}" \
131+
--dataset-path "${DATASET_PATH}" \
132+
--hf-split "${DATASET_SPLIT}" \
133+
--num-prompts "${NUM_PROMPTS}"
134+
```
24135

25-
You can download the BurstGPT v1.1 dataset by running:
136+
---
137+
## Example - Offline Throughput Benchmark
26138

27139
```bash
28-
wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv
140+
MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
141+
NUM_PROMPTS=10
142+
DATASET_NAME="sonnet"
143+
DATASET_PATH="benchmarks/sonnet.txt"
144+
145+
python3 benchmarks/benchmark_throughput.py \
146+
--model "${MODEL_NAME}" \
147+
--dataset-name "${DATASET_NAME}" \
148+
--dataset-path "${DATASET_PATH}" \
149+
--num-prompts "${NUM_PROMPTS}"
150+
```
151+
152+
If successful, you will see the following output
153+
154+
```
155+
Throughput: 7.35 requests/s, 4789.20 total tokens/s, 1102.83 output tokens/s
29156
```
157+
158+
### Benchmark with LoRA Adapters
159+
160+
``` bash
161+
MODEL_NAME="meta-llama/Llama-2-7b-hf"
162+
BACKEND="vllm"
163+
DATASET_NAME="sharegpt"
164+
DATASET_PATH="/home/jovyan/data/vllm_benchmark_datasets/ShareGPT_V3_unfiltered_cleaned_split.json"
165+
NUM_PROMPTS=10
166+
MAX_LORAS=2
167+
MAX_LORA_RANK=8
168+
ENABLE_LORA="--enable-lora"
169+
LORA_PATH="yard1/llama-2-7b-sql-lora-test"
170+
171+
python3 benchmarks/benchmark_throughput.py \
172+
--model "${MODEL_NAME}" \
173+
--backend "${BACKEND}" \
174+
--dataset_path "${DATASET_PATH}" \
175+
--dataset_name "${DATASET_NAME}" \
176+
--num-prompts "${NUM_PROMPTS}" \
177+
--max-loras "${MAX_LORAS}" \
178+
--max-lora-rank "${MAX_LORA_RANK}" \
179+
${ENABLE_LORA} \
180+
--lora-path "${LORA_PATH}"
181+
```

0 commit comments

Comments
 (0)