@@ -29,7 +29,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
29
29
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
30
30
``` bash
31
31
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
32
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/gpu-deployment.yaml
32
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/vllm/gpu-deployment.yaml
33
33
```
34
34
35
35
#### CPU-Based Model Server
@@ -38,37 +38,37 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
38
38
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
39
39
``` bash
40
40
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
41
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/cpu-deployment.yaml
41
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/vllm/cpu-deployment.yaml
42
42
```
43
43
44
44
### Install the Inference Extension CRDs
45
45
46
46
``` bash
47
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
48
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
47
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
48
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
49
49
```
50
50
51
51
### Deploy InferenceModel
52
52
53
53
Deploy the sample InferenceModel which is configured to load balance traffic between the ` tweet-summary-0 ` and ` tweet-summary-1 `
54
54
[ LoRA adapters] ( https://docs.vllm.ai/en/latest/features/lora.html ) of the sample model server.
55
55
``` bash
56
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/inferencemodel.yaml
56
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/inferencemodel.yaml
57
57
```
58
58
59
59
### Update Envoy Gateway Config to enable Patch Policy**
60
60
61
61
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via ` EnvoyPatchPolicy ` . To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
62
62
``` bash
63
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/enable_patch_policy.yaml
63
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/enable_patch_policy.yaml
64
64
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
65
65
```
66
66
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
67
67
68
68
### Deploy Gateway
69
69
70
70
``` bash
71
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gateway.yaml
71
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/gateway.yaml
72
72
```
73
73
> ** _ NOTE:_ ** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: ` Backend ` , ` HTTPRoute ` , the resources included in the ` ./config/manifests/gateway/ext-proc.yaml ` file, and an additional ` ./config/manifests/gateway/patch_policy.yaml ` file. *** Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
74
74
@@ -81,13 +81,13 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
81
81
### Deploy the Inference Extension and InferencePool
82
82
83
83
``` bash
84
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/ext_proc.yaml
84
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/ext_proc.yaml
85
85
```
86
86
### Deploy Envoy Gateway Custom Policies
87
87
88
88
``` bash
89
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/extension_policy.yaml
90
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/patch_policy.yaml
89
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/extension_policy.yaml
90
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/patch_policy.yaml
91
91
```
92
92
> ** _ NOTE:_ ** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
93
93
@@ -96,7 +96,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
96
96
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
97
97
98
98
``` bash
99
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/traffic_policy.yaml
99
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/traffic_policy.yaml
100
100
```
101
101
102
102
### Try it out
@@ -120,16 +120,16 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
120
120
The following cleanup assumes you would like to clean ALL resources that were created in this quickstart guide.
121
121
please be careful not to delete resources you'd like to keep.
122
122
``` bash
123
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/traffic_policy.yaml --ignore-not-found
124
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/extension_policy.yaml --ignore-not-found
125
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/patch_policy.yaml --ignore-not-found
126
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/ext_proc.yaml --ignore-not-found
127
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gateway.yaml --ignore-not-found
128
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/enable_patch_policy.yaml --ignore-not-found
129
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/inferencemodel.yaml --ignore-not-found
130
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml --ignore-not-found
131
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml --ignore-not-found
132
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
133
- kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
123
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/traffic_policy.yaml --ignore-not-found
124
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/extension_policy.yaml --ignore-not-found
125
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/patch_policy.yaml --ignore-not-found
126
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/ext_proc.yaml --ignore-not-found
127
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/gateway.yaml --ignore-not-found
128
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/gateway/enable_patch_policy.yaml --ignore-not-found
129
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/inferencemodel.yaml --ignore-not-found
130
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml --ignore-not-found
131
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml --ignore-not-found
132
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
133
+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/release-0.2 /config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
134
134
kubectl delete secret hf-token --ignore-not-found
135
135
```
0 commit comments