Skip to content

Commit 51ede3a

Browse files
committed
Update README.md for CPU
Signed-off-by: Tsai, Louie <[email protected]>
1 parent c54a099 commit 51ede3a

File tree

1 file changed

+19
-7
lines changed

1 file changed

+19
-7
lines changed

.buildkite/nightly-benchmarks/README.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performanc
1111

1212
## Performance benchmark quick overview
1313

14-
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!), with different models.
14+
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for FP8 benchmark on H100 is coming!) and Intel® Xeon® Processors, with different models.
1515

1616
**Benchmarking Duration**: about 1hr.
1717

@@ -31,16 +31,28 @@ Performance benchmark will be triggered when:
3131
- A PR being merged into vllm.
3232
- Every commit for those PRs with `perf-benchmarks` label AND `ready` label.
3333

34+
Manually Trigger the benchmark
35+
36+
```bash
37+
bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
38+
```
39+
40+
Runtime environment variables:
41+
- ON_CPU: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
42+
- SERVING_JSON: assign a json file instead of the default one for serving tests. Default value is empty string.
43+
- LATENCY_JSON: assign a json file instead of the default one for latency tests. Default value is empty string.
44+
- THROUGHPUT_JSON: assign a json file instead of the default one for throughout tests. Default value is empty string.
45+
3446
Nightly benchmark will be triggered when:
3547
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
3648

3749
## Performance benchmark details
3850

39-
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
40-
51+
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests-gpu.json`, `tests/throughput-tests-gpu.json`, `tests/serving-tests-gpu.json` to configure the test cases.
52+
> NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
4153
### Latency test
4254

43-
Here is an example of one test inside `latency-tests.json`:
55+
Here is an example of one test inside `latency-tests-gpu.json`:
4456

4557
```json
4658
[
@@ -59,7 +71,7 @@ Here is an example of one test inside `latency-tests.json`:
5971

6072
In this example:
6173

62-
- The `test_name` attributes is a unique identifier for the test. In `latency-tests.json`, it must start with `latency_`.
74+
- The `test_name` attributes is a unique identifier for the test. In `latency-tests-gpu.json`, it must start with `latency_`.
6375
- The `parameters` attribute control the command line arguments to be used for `benchmark_latency.py`. Note that please use underline `_` instead of the dash `-` when specifying the command line arguments, and `run-performance-benchmarks.sh` will convert the underline to dash when feeding the arguments to `benchmark_latency.py`. For example, the corresponding command line arguments for `benchmark_latency.py` will be `--model meta-llama/Meta-Llama-3-8B --tensor-parallel-size 1 --load-format dummy --num-iters-warmup 5 --num-iters 15`
6476

6577
Note that the performance numbers are highly sensitive to the value of the parameters. Please make sure the parameters are set correctly.
@@ -68,13 +80,13 @@ WARNING: The benchmarking script will save json results by itself, so please do
6880

6981
### Throughput test
7082

71-
The tests are specified in `throughput-tests.json`. The syntax is similar to `latency-tests.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
83+
The tests are specified in `throughput-tests-gpu.json`. The syntax is similar to `latency-tests-gpu.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
7284

7385
The number of this test is also stable -- a slight change on the value of this number might vary the performance numbers by a lot.
7486

7587
### Serving test
7688

77-
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests.json`, and here is an example:
89+
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests-gpu.json`, and here is an example:
7890

7991
```json
8092
[

0 commit comments

Comments
 (0)