Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update benchmarking guide with latest results with vllm v1 #559

Merged
merged 3 commits into from
Mar 28, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified site-src/performance/benchmark/example-bar-chart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions site-src/performance/benchmark/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ The LPG benchmark tool works by sending traffic to the specified target IP and p
# Get gateway IP
GW_IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
# Get LoadBalancer k8s service IP
SVC_IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
SVC_IP=$(kubectl get service/vllm-llama2-7b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

echo $GW_IP
echo $SVC_IP
Expand Down Expand Up @@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
```

1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
see a bar chart like below:
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB), llama2-7b and the ShareGPT dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the source of the share GPT dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@smarterclayton smarterclayton Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the reference page https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

EDIT: nm, I thought this was a file download page, not the HF details page. Link is fine.

Copy link
Contributor

@smarterclayton smarterclayton Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, discuss why we chose the cleaned one vs the raw since it's not obvious to a casual glance, and if we use "ShareGPT" as a shortcut description that is not entirely accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaushikmitr pls address this

Copy link
Contributor Author

@kaushikmitr kaushikmitr Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked to ShareGPT, frankly speaking I used the cleaned one since we have been using that for other benchmarking like https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/benchmarks/benchmark/tools/profile-generator/container/Dockerfile I have also seen the cleaned version being used in vllm benchmarking suite (https://github.com/vllm-project/vllm/tree/main/benchmarks). I can update the text accordingly


![alt text](example-bar-chart.png)