You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .buildkite/nightly-benchmarks/README.md
+17-7Lines changed: 17 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performanc
11
11
12
12
## Performance benchmark quick overview
13
13
14
-
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!), with different models.
14
+
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!) and Intel® Xeon® Processors, with different models.
15
15
16
16
**Benchmarking Duration**: about 1hr.
17
17
@@ -31,16 +31,26 @@ Performance benchmark will be triggered when:
31
31
- A PR being merged into vllm.
32
32
- Every commit for those PRs with `perf-benchmarks` label AND `ready` label.
- ON_CPU: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
40
+
- SERVING_JSON: assign a json file instead of the default one for serving tests. Default value is empty string.
41
+
- LATENCY_JSON: assign a json file instead of the default one for latency tests. Default value is empty string.
42
+
- THROUGHPUT_JSON: assign a json file instead of the default one for throughout tests. Default value is empty string.
43
+
34
44
Nightly benchmark will be triggered when:
35
45
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
36
46
37
47
## Performance benchmark details
38
48
39
-
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
40
-
49
+
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests-gpu.json`, `tests/throughput-tests-gpu.json`, `tests/serving-tests-gpu.json` to configure the test cases.
50
+
> NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
41
51
### Latency test
42
52
43
-
Here is an example of one test inside `latency-tests.json`:
53
+
Here is an example of one test inside `latency-tests-gpu.json`:
44
54
45
55
```json
46
56
[
@@ -59,7 +69,7 @@ Here is an example of one test inside `latency-tests.json`:
59
69
60
70
In this example:
61
71
62
-
- The `test_name` attributes is a unique identifier for the test. In `latency-tests.json`, it must start with `latency_`.
72
+
- The `test_name` attributes is a unique identifier for the test. In `latency-tests-gpu.json`, it must start with `latency_`.
63
73
- The `parameters` attribute control the command line arguments to be used for `benchmark_latency.py`. Note that please use underline `_` instead of the dash `-` when specifying the command line arguments, and `run-performance-benchmarks.sh` will convert the underline to dash when feeding the arguments to `benchmark_latency.py`. For example, the corresponding command line arguments for `benchmark_latency.py` will be `--model meta-llama/Meta-Llama-3-8B --tensor-parallel-size 1 --load-format dummy --num-iters-warmup 5 --num-iters 15`
64
74
65
75
Note that the performance numbers are highly sensitive to the value of the parameters. Please make sure the parameters are set correctly.
@@ -68,13 +78,13 @@ WARNING: The benchmarking script will save json results by itself, so please do
68
78
69
79
### Throughput test
70
80
71
-
The tests are specified in `throughput-tests.json`. The syntax is similar to `latency-tests.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
81
+
The tests are specified in `throughput-tests-gpu.json`. The syntax is similar to `latency-tests-gpu.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
72
82
73
83
The number of this test is also stable -- a slight change on the value of this number might vary the performance numbers by a lot.
74
84
75
85
### Serving test
76
86
77
-
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests.json`, and here is an example:
87
+
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests-gpu.json`, and here is an example:
0 commit comments