@@ -17,7 +17,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
17
17
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
18
18
``` bash
19
19
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/vllm/deployment.yaml
20
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/vllm/deployment.yaml
21
21
```
22
22
23
23
### Install the Inference Extension CRDs
@@ -31,22 +31,22 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
31
31
Deploy the sample InferenceModel which is configured to load balance traffic between the ` tweet-summary-0 ` and ` tweet-summary-1 `
32
32
[ LoRA adapters] ( https://docs.vllm.ai/en/latest/features/lora.html ) of the sample model server.
33
33
``` bash
34
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/inferencemodel.yaml
34
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/inferencemodel.yaml
35
35
```
36
36
37
37
### Update Envoy Gateway Config to enable Patch Policy**
38
38
39
39
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via ` EnvoyPatchPolicy ` . To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
40
40
``` bash
41
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/gateway/enable_patch_policy.yaml
41
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/gateway/enable_patch_policy.yaml
42
42
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
43
43
```
44
44
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
45
45
46
46
### Deploy Gateway
47
47
48
48
``` bash
49
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/gateway/gateway.yaml
49
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/gateway/gateway.yaml
50
50
```
51
51
> ** _ NOTE:_ ** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: ` Backend ` , ` HTTPRoute ` , the resources included in the ` ./manifests/gateway/ext-proc.yaml ` file, and an additional ` ./manifests/gateway/patch_policy.yaml ` file. *** Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
52
52
@@ -59,13 +59,13 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
59
59
### Deploy the Inference Extension and InferencePool
60
60
61
61
``` bash
62
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/ext_proc.yaml
62
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/ext_proc.yaml
63
63
```
64
64
### Deploy Envoy Gateway Custom Policies
65
65
66
66
``` bash
67
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/gateway/extension_policy.yaml
68
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/gateway/patch_policy.yaml
67
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/gateway/extension_policy.yaml
68
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/gateway/patch_policy.yaml
69
69
```
70
70
> ** _ NOTE:_ ** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
71
71
@@ -74,7 +74,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
74
74
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
75
75
76
76
``` bash
77
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg /manifests/gateway/traffic_policy.yaml
77
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config /manifests/gateway/traffic_policy.yaml
78
78
```
79
79
80
80
### Try it out
0 commit comments