Skip to content

Commit 1299794

Browse files
authored
Update README.md for CPU
1 parent c54a099 commit 1299794

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

.buildkite/nightly-benchmarks/README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performanc
1111

1212
## Performance benchmark quick overview
1313

14-
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!), with different models.
14+
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!) and Intel® Xeon® Processors, with different models.
1515

1616
**Benchmarking Duration**: about 1hr.
1717

@@ -31,16 +31,26 @@ Performance benchmark will be triggered when:
3131
- A PR being merged into vllm.
3232
- Every commit for those PRs with `perf-benchmarks` label AND `ready` label.
3333

34+
Manually Trigger the benchmark
35+
```bash
36+
bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
37+
```
38+
Runtime environment variables:
39+
- ON_CPU: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
40+
- SERVING_JSON: assign a json file instead of the default one for serving tests. Default value is empty string.
41+
- LATENCY_JSON: assign a json file instead of the default one for latency tests. Default value is empty string.
42+
- THROUGHPUT_JSON: assign a json file instead of the default one for throughout tests. Default value is empty string.
43+
3444
Nightly benchmark will be triggered when:
3545
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
3646

3747
## Performance benchmark details
3848

39-
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
40-
49+
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests-gpu.json`, `tests/throughput-tests-gpu.json`, `tests/serving-tests-gpu.json` to configure the test cases.
50+
> NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
4151
### Latency test
4252

43-
Here is an example of one test inside `latency-tests.json`:
53+
Here is an example of one test inside `latency-tests-gpu.json`:
4454

4555
```json
4656
[
@@ -59,7 +69,7 @@ Here is an example of one test inside `latency-tests.json`:
5969

6070
In this example:
6171

62-
- The `test_name` attributes is a unique identifier for the test. In `latency-tests.json`, it must start with `latency_`.
72+
- The `test_name` attributes is a unique identifier for the test. In `latency-tests-gpu.json`, it must start with `latency_`.
6373
- The `parameters` attribute control the command line arguments to be used for `benchmark_latency.py`. Note that please use underline `_` instead of the dash `-` when specifying the command line arguments, and `run-performance-benchmarks.sh` will convert the underline to dash when feeding the arguments to `benchmark_latency.py`. For example, the corresponding command line arguments for `benchmark_latency.py` will be `--model meta-llama/Meta-Llama-3-8B --tensor-parallel-size 1 --load-format dummy --num-iters-warmup 5 --num-iters 15`
6474

6575
Note that the performance numbers are highly sensitive to the value of the parameters. Please make sure the parameters are set correctly.
@@ -68,13 +78,13 @@ WARNING: The benchmarking script will save json results by itself, so please do
6878

6979
### Throughput test
7080

71-
The tests are specified in `throughput-tests.json`. The syntax is similar to `latency-tests.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
81+
The tests are specified in `throughput-tests-gpu.json`. The syntax is similar to `latency-tests-gpu.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
7282

7383
The number of this test is also stable -- a slight change on the value of this number might vary the performance numbers by a lot.
7484

7585
### Serving test
7686

77-
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests.json`, and here is an example:
87+
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests-gpu.json`, and here is an example:
7888

7989
```json
8090
[

0 commit comments

Comments
 (0)