Skip to content

Commit 0e714fe

Browse files
committed
update benchmarking guide with latest results with vllm v1
1 parent afab4b7 commit 0e714fe

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed
Loading

site-src/performance/benchmark/index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ The LPG benchmark tool works by sending traffic to the specified target IP and p
4545
# Get gateway IP
4646
GW_IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
4747
# Get LoadBalancer k8s service IP
48-
SVC_IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
48+
SVC_IP=$(kubectl get service/vllm-llama2-7b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
4949
5050
echo $GW_IP
5151
echo $SVC_IP
@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
9393
```
9494

9595
1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
96-
see a bar chart like below:
96+
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB), llama2-7b and the ShareGPT dataset.
9797

9898
![alt text](example-bar-chart.png)

0 commit comments

Comments
 (0)