Skip to content

Commit ff9cbe7

Browse files
authored
Updating kubectl apply to ref direct files and locking EPP to a specific SHA (#233)
1 parent 1ef4b45 commit ff9cbe7

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

pkg/README.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -15,35 +15,35 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
1515
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
1616
```bash
1717
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
18-
kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
18+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/vllm-lora-deployment.yaml
1919
```
2020

2121
1. **Install the CRDs into the cluster:**
2222

2323
```sh
24-
kubectl apply -f config/crd/bases
24+
kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd
2525
```
2626

2727
1. **Deploy InferenceModel and InferencePool**
2828

2929
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
3030
```bash
31-
kubectl apply -f ./manifests/inferencepool-with-model.yaml
31+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencepool-with-model.yaml
3232
```
3333

3434
1. **Update Envoy Gateway Config to enable Patch Policy**
3535

3636
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
3737
```bash
38-
kubectl apply -f ./manifests/gateway/enable_patch_policy.yaml
38+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
3939
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
4040
```
4141
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
4242

4343
1. **Deploy Gateway**
4444

4545
```bash
46-
kubectl apply -f ./manifests/gateway/gateway.yaml
46+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
4747
```
4848
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
4949
@@ -53,14 +53,14 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
5353
1. **Deploy Ext-Proc**
5454

5555
```bash
56-
kubectl apply -f ./manifests/ext_proc.yaml
56+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
5757
```
5858

5959
1. **Deploy Envoy Gateway Custom Policies**
6060

6161
```bash
62-
kubectl apply -f ./manifests/gateway/extension_policy.yaml
63-
kubectl apply -f ./manifests/gateway/patch_policy.yaml
62+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
63+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
6464
```
6565
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
6666
@@ -69,7 +69,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
6969
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
7070

7171
```bash
72-
kubectl apply -f ./manifests/gateway/traffic_policy.yaml
72+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
7373
```
7474

7575
1. **Try it out**

pkg/manifests/ext_proc.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ spec:
6060
containers:
6161
- name: inference-gateway-ext-proc
6262
# TODO(https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/34) Update the image and args.
63-
image: us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp:main
63+
image: us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp@sha256:1ef39ee79c55db6436f9d4cb14cc957dea647f57b8a7ff0363f60b0711695103
6464
args:
6565
- -poolName
6666
- "vllm-llama2-7b-pool"

0 commit comments

Comments
 (0)