|
1 | 1 | # Benchmarking vLLM
|
2 | 2 |
|
3 |
| -## Downloading the ShareGPT dataset |
| 3 | +This README guides you through running benchmark tests with the extensive |
| 4 | +datasets supported on vLLM. It’s a living document, updated as new features and datasets |
| 5 | +become available. |
4 | 6 |
|
5 |
| -You can download the dataset by running: |
| 7 | +## Dataset Overview |
| 8 | + |
| 9 | +<table style="width:100%; border-collapse: collapse;"> |
| 10 | + <thead> |
| 11 | + <tr> |
| 12 | + <th style="width:15%; text-align: left;">Dataset</th> |
| 13 | + <th style="width:10%; text-align: center;">Online</th> |
| 14 | + <th style="width:10%; text-align: center;">Offline</th> |
| 15 | + <th style="width:65%; text-align: left;">Data Path</th> |
| 16 | + </tr> |
| 17 | + </thead> |
| 18 | + <tbody> |
| 19 | + <tr> |
| 20 | + <td><strong>ShareGPT</strong></td> |
| 21 | + <td style="text-align: center;">✅</td> |
| 22 | + <td style="text-align: center;">✅</td> |
| 23 | + <td><code>wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json</code></td> |
| 24 | + </tr> |
| 25 | + <tr> |
| 26 | + <td><strong>BurstGPT</strong></td> |
| 27 | + <td style="text-align: center;">✅</td> |
| 28 | + <td style="text-align: center;">✅</td> |
| 29 | + <td><code>wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv</code></td> |
| 30 | + </tr> |
| 31 | + <tr> |
| 32 | + <td><strong>Sonnet</strong></td> |
| 33 | + <td style="text-align: center;">✅</td> |
| 34 | + <td style="text-align: center;">✅</td> |
| 35 | + <td>Local file: <code>benchmarks/sonnet.txt</code></td> |
| 36 | + </tr> |
| 37 | + <tr> |
| 38 | + <td><strong>Random</strong></td> |
| 39 | + <td style="text-align: center;">✅</td> |
| 40 | + <td style="text-align: center;">✅</td> |
| 41 | + <td><code>synthetic</code></td> |
| 42 | + </tr> |
| 43 | + <tr> |
| 44 | + <td><strong>HuggingFace</strong></td> |
| 45 | + <td style="text-align: center;">✅</td> |
| 46 | + <td style="text-align: center;">🚧</td> |
| 47 | + <td>Specify your dataset path on HuggingFace</td> |
| 48 | + </tr> |
| 49 | + <tr> |
| 50 | + <td><strong>VisionArena</strong></td> |
| 51 | + <td style="text-align: center;">✅</td> |
| 52 | + <td style="text-align: center;">🚧</td> |
| 53 | + <td><code>lmarena-ai/vision-arena-bench-v0.1</code> (a HuggingFace dataset)</td> |
| 54 | + </tr> |
| 55 | + </tbody> |
| 56 | +</table> |
| 57 | +✅: supported |
| 58 | +🚧: to be supported |
| 59 | + |
| 60 | +**Note**: VisionArena’s `dataset-name` should be set to `hf` |
| 61 | + |
| 62 | +--- |
| 63 | +## Example - Online Benchmark |
| 64 | + |
| 65 | +First start serving your model |
6 | 66 |
|
7 | 67 | ```bash
|
8 |
| -wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json |
| 68 | +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" |
| 69 | +vllm serve ${MODEL_NAME} --disable-log-requests |
9 | 70 | ```
|
10 | 71 |
|
11 |
| -## Downloading the ShareGPT4V dataset |
| 72 | +Then run the benchmarking script |
| 73 | + |
| 74 | +```bash |
| 75 | +# download dataset |
| 76 | +# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json |
| 77 | +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" |
| 78 | +NUM_PROMPTS=10 |
| 79 | +BACKEND="openai-chat" |
| 80 | +DATASET_NAME="sharegpt" |
| 81 | +DATASET_PATH="<your data path>/ShareGPT_V3_unfiltered_cleaned_split.json" |
| 82 | +python3 benchmarks/benchmark_serving.py --backend ${BACKEND} --model ${MODEL_NAME} --endpoint /v1/chat/completions --dataset-name ${DATASET_NAME} --dataset-path ${DATASET_PATH} --num-prompts ${NUM_PROMPTS} |
| 83 | +``` |
| 84 | + |
| 85 | +If successful, you will see the following output |
| 86 | + |
| 87 | +``` |
| 88 | +============ Serving Benchmark Result ============ |
| 89 | +Successful requests: 10 |
| 90 | +Benchmark duration (s): 5.78 |
| 91 | +Total input tokens: 1369 |
| 92 | +Total generated tokens: 2212 |
| 93 | +Request throughput (req/s): 1.73 |
| 94 | +Output token throughput (tok/s): 382.89 |
| 95 | +Total Token throughput (tok/s): 619.85 |
| 96 | +---------------Time to First Token---------------- |
| 97 | +Mean TTFT (ms): 71.54 |
| 98 | +Median TTFT (ms): 73.88 |
| 99 | +P99 TTFT (ms): 79.49 |
| 100 | +-----Time per Output Token (excl. 1st token)------ |
| 101 | +Mean TPOT (ms): 7.91 |
| 102 | +Median TPOT (ms): 7.96 |
| 103 | +P99 TPOT (ms): 8.03 |
| 104 | +---------------Inter-token Latency---------------- |
| 105 | +Mean ITL (ms): 7.74 |
| 106 | +Median ITL (ms): 7.70 |
| 107 | +P99 ITL (ms): 8.39 |
| 108 | +================================================== |
| 109 | +``` |
12 | 110 |
|
13 |
| -The json file refers to several image datasets (coco, llava, etc.). The benchmark scripts |
14 |
| -will ignore a datapoint if the referred image is missing. |
| 111 | +### VisionArena Benchmark for Vision Language Models |
15 | 112 |
|
16 | 113 | ```bash
|
17 |
| -wget https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/resolve/main/sharegpt4v_instruct_gpt4-vision_cap100k.json |
18 |
| -mkdir coco -p |
19 |
| -wget http://images.cocodataset.org/zips/train2017.zip -O coco/train2017.zip |
20 |
| -unzip coco/train2017.zip -d coco/ |
| 114 | +# need a model with vision capability here |
| 115 | +vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests |
21 | 116 | ```
|
22 | 117 |
|
23 |
| -# Downloading the BurstGPT dataset |
| 118 | +```bash |
| 119 | +MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct" |
| 120 | +NUM_PROMPTS=10 |
| 121 | +BACKEND="openai-chat" |
| 122 | +DATASET_NAME="hf" |
| 123 | +DATASET_PATH="lmarena-ai/vision-arena-bench-v0.1" |
| 124 | +DATASET_SPLIT='train' |
| 125 | + |
| 126 | +python3 benchmarks/benchmark_serving.py \ |
| 127 | + --backend "${BACKEND}" \ |
| 128 | + --model "${MODEL_NAME}" \ |
| 129 | + --endpoint "/v1/chat/completions" \ |
| 130 | + --dataset-name "${DATASET_NAME}" \ |
| 131 | + --dataset-path "${DATASET_PATH}" \ |
| 132 | + --hf-split "${DATASET_SPLIT}" \ |
| 133 | + --num-prompts "${NUM_PROMPTS}" |
| 134 | +``` |
24 | 135 |
|
25 |
| -You can download the BurstGPT v1.1 dataset by running: |
| 136 | +--- |
| 137 | +## Example - Offline Throughput Benchmark |
26 | 138 |
|
27 | 139 | ```bash
|
28 |
| -wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv |
| 140 | +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" |
| 141 | +NUM_PROMPTS=10 |
| 142 | +DATASET_NAME="sonnet" |
| 143 | +DATASET_PATH="benchmarks/sonnet.txt" |
| 144 | + |
| 145 | +python3 benchmarks/benchmark_throughput.py \ |
| 146 | + --model "${MODEL_NAME}" \ |
| 147 | + --dataset-name "${DATASET_NAME}" \ |
| 148 | + --dataset-path "${DATASET_PATH}" \ |
| 149 | + --num-prompts "${NUM_PROMPTS}" |
| 150 | + ``` |
| 151 | + |
| 152 | +If successful, you will see the following output |
| 153 | + |
| 154 | +``` |
| 155 | +Throughput: 7.35 requests/s, 4789.20 total tokens/s, 1102.83 output tokens/s |
29 | 156 | ```
|
| 157 | + |
| 158 | +### Benchmark with LoRA Adapters |
| 159 | + |
| 160 | +``` bash |
| 161 | +MODEL_NAME="meta-llama/Llama-2-7b-hf" |
| 162 | +BACKEND="vllm" |
| 163 | +DATASET_NAME="sharegpt" |
| 164 | +DATASET_PATH="/home/jovyan/data/vllm_benchmark_datasets/ShareGPT_V3_unfiltered_cleaned_split.json" |
| 165 | +NUM_PROMPTS=10 |
| 166 | +MAX_LORAS=2 |
| 167 | +MAX_LORA_RANK=8 |
| 168 | +ENABLE_LORA="--enable-lora" |
| 169 | +LORA_PATH="yard1/llama-2-7b-sql-lora-test" |
| 170 | + |
| 171 | +python3 benchmarks/benchmark_throughput.py \ |
| 172 | + --model "${MODEL_NAME}" \ |
| 173 | + --backend "${BACKEND}" \ |
| 174 | + --dataset_path "${DATASET_PATH}" \ |
| 175 | + --dataset_name "${DATASET_NAME}" \ |
| 176 | + --num-prompts "${NUM_PROMPTS}" \ |
| 177 | + --max-loras "${MAX_LORAS}" \ |
| 178 | + --max-lora-rank "${MAX_LORA_RANK}" \ |
| 179 | + ${ENABLE_LORA} \ |
| 180 | + --lora-path "${LORA_PATH}" |
| 181 | + ``` |
0 commit comments