@@ -19,31 +19,27 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
19
19
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20
20
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21
21
```
22
-
23
- 2 . ** Install the Inference Extension CRDs:**
22
+ 1 . ** Install the Inference Extension CRDs:**
24
23
25
24
``` sh
26
25
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml
27
- ```
28
-
29
- 3 . ** Deploy InferenceModel**
26
+
27
+ 1. ** Deploy InferenceModel**
30
28
31
29
Deploy the sample InferenceModel which is configured to load balance traffic between the ` tweet-summary-0` and ` tweet-summary-1`
32
30
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
33
31
` ` ` bash
34
32
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml
35
33
` ` `
36
-
37
- 4 . ** Update Envoy Gateway Config to enable Patch Policy**
34
+ 1. ** Update Envoy Gateway Config to enable Patch Policy**
38
35
39
36
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via ` EnvoyPatchPolicy` . To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
40
37
` ` ` bash
41
38
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
42
39
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
43
40
` ` `
44
41
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
45
-
46
- 5 . ** Deploy Gateway**
42
+ 1. ** Deploy Gateway**
47
43
48
44
` ` ` bash
49
45
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
@@ -56,30 +52,26 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
56
52
NAME CLASS ADDRESS PROGRAMMED AGE
57
53
inference-gateway inference-gateway < MY_ADDRESS> True 22s
58
54
` ` `
59
-
60
- 6 . ** Deploy the Inference Extension and InferencePool**
55
+ 1. ** Deploy the Inference Extension and InferencePool**
61
56
62
57
` ` ` bash
63
58
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
64
59
` ` `
65
-
66
- 7 . ** Deploy Envoy Gateway Custom Policies**
60
+ 1. ** Deploy Envoy Gateway Custom Policies**
67
61
68
62
` ` ` bash
69
63
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
70
64
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
71
65
` ` `
72
66
> ** _NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
73
-
74
- 8 . ** OPTIONALLY** : Apply Traffic Policy
67
+ 1. ** OPTIONALLY** : Apply Traffic Policy
75
68
76
69
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
77
70
78
71
` ` ` bash
79
72
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
80
73
` ` `
81
-
82
- 9 . ** Try it out**
74
+ 1. ** Try it out**
83
75
84
76
Wait until the gateway is ready.
85
77
0 commit comments