Skip to content

fixed filepath that points to gpu based model server deployment in few places #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions hack/release-quickstart.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ sed -i.bak '/us-central1-docker.pkg.dev\/k8s-staging-images\/gateway-api-inferen
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g" "$EXT_PROC"

# -----------------------------------------------------------------------------
# Update config/manifests/vllm/deployment.yaml
# Update config/manifests/vllm/gpu-deployment.yaml
# -----------------------------------------------------------------------------
VLLM_DEPLOY="config/manifests/vllm/deployment.yaml"
VLLM_DEPLOY="config/manifests/vllm/gpu-deployment.yaml"
echo "Updating ${VLLM_DEPLOY} ..."

# Update the vLLM image version
Expand Down
2 changes: 1 addition & 1 deletion site-src/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv

#### GPU-Based Model Server

For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/deployment.yaml` as needed.
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
```bash
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/e2e_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ const (
// clientManifest is the manifest for the client test resources.
clientManifest = "../testdata/client.yaml"
// modelServerManifest is the manifest for the model server test resources.
modelServerManifest = "../../config/manifests/vllm/deployment.yaml"
modelServerManifest = "../../config/manifests/vllm/gpu-deployment.yaml"
// modelServerSecretManifest is the manifest for the model server secret resource.
modelServerSecretManifest = "../testdata/model-secret.yaml"
// inferPoolManifest is the manifest for the inference pool CRD.
Expand Down