Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update benchmarking guide with latest results with vllm v1 #559

Merged
merged 3 commits into from
Mar 28, 2025

Conversation

kaushikmitr
Copy link
Contributor

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 21, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @kaushikmitr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 21, 2025
Copy link

netlify bot commented Mar 21, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit f252457
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e639b18d2eeb00087e29a9
😎 Deploy Preview https://deploy-preview-559--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 21, 2025

@danehans fyi

@liu-cong
Copy link
Contributor

liu-cong commented Mar 21, 2025

Can you add the vllm deployment yaml used in this benchmark as well?

And also the output json files as an example people should expect from the benchmark

@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
```

1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
see a bar chart like below:
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB) and the ShareGPT dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inference extension" is the name of the project and not a specific extension. I assume the test was conducted with the Endpoint Selector Extension (ESE)? If so, s/represents inference extension/represents the endpoint selector inference extension/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is correct

@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
```

1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
see a bar chart like below:
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB), llama2-7b and the ShareGPT dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the source of the share GPT dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@smarterclayton smarterclayton Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the reference page https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

EDIT: nm, I thought this was a file download page, not the HF details page. Link is fine.

Copy link
Contributor

@smarterclayton smarterclayton Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, discuss why we chose the cleaned one vs the raw since it's not obvious to a casual glance, and if we use "ShareGPT" as a shortcut description that is not entirely accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaushikmitr pls address this

Copy link
Contributor Author

@kaushikmitr kaushikmitr Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked to ShareGPT, frankly speaking I used the cleaned one since we have been using that for other benchmarking like https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/benchmarks/benchmark/tools/profile-generator/container/Dockerfile I have also seen the cleaned version being used in vllm benchmarking suite (https://github.com/vllm-project/vllm/tree/main/benchmarks). I can update the text accordingly

@ahg-g
Copy link
Contributor

ahg-g commented Mar 28, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 28, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Mar 28, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit 61125a8 into kubernetes-sigs:main Mar 28, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants