|
3 | 3 | This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
|
4 | 4 |
|
5 | 5 | ## **Prerequisites**
|
6 |
| - - Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher |
7 | 6 | - A cluster with:
|
8 | 7 | - Support for services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running).
|
9 | 8 | For example, with Kind, you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
|
@@ -56,55 +55,114 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
|
56 | 55 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
|
57 | 56 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
|
58 | 57 | ```
|
59 |
| - |
| 58 | + |
60 | 59 | ### Deploy InferenceModel
|
61 | 60 |
|
62 | 61 | Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
|
63 | 62 | [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
|
| 63 | + |
64 | 64 | ```bash
|
65 | 65 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml
|
66 | 66 | ```
|
67 | 67 |
|
68 |
| -### Update Envoy Gateway Config to enable Patch Policy** |
| 68 | +### Deploy Inference Gateway |
69 | 69 |
|
70 |
| - Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: |
71 |
| - ```bash |
72 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml |
73 |
| - kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system |
74 |
| - ``` |
75 |
| - Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. |
| 70 | + Choose one of the following options to deploy an Inference Gateway. |
76 | 71 |
|
77 |
| -### Deploy Gateway |
| 72 | +=== "Envoy Gateway" |
78 | 73 |
|
79 |
| - ```bash |
80 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml |
81 |
| - ``` |
82 |
| - > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** |
| 74 | + 1. Requirements |
83 | 75 |
|
84 |
| - Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
85 |
| - ```bash |
86 |
| - $ kubectl get gateway inference-gateway |
87 |
| - NAME CLASS ADDRESS PROGRAMMED AGE |
88 |
| - inference-gateway inference-gateway <MY_ADDRESS> True 22s |
89 |
| - ``` |
90 |
| -### Deploy the InferencePool and Extension |
| 76 | + - Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher. |
91 | 77 |
|
92 |
| - ```bash |
93 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml |
94 |
| - ``` |
95 |
| -### Deploy Envoy Gateway Custom Policies |
| 78 | + 1. Update Envoy Gateway Config to enable Patch Policy |
96 | 79 |
|
97 |
| - ```bash |
98 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml |
99 |
| - ``` |
100 |
| - > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. |
101 |
| - |
102 |
| -### **OPTIONALLY**: Apply Traffic Policy |
| 80 | + Our custom LLM Gateway ext-proc is patched into the existing Envoy Gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the |
| 81 | + Envoy Gateway config map. To do this, apply the following manifest and restart Envoy Gateway: |
| 82 | + |
| 83 | + ```bash |
| 84 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml |
| 85 | + kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system |
| 86 | + ``` |
| 87 | + |
| 88 | + Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. |
| 89 | + |
| 90 | + 1. Deploy GatewayClass, Gateway, Backend, and HTTPRoute resources |
| 91 | + |
| 92 | + ```bash |
| 93 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml |
| 94 | + ``` |
| 95 | + |
| 96 | + > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** |
| 97 | + |
| 98 | + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
| 99 | + ```bash |
| 100 | + $ kubectl get gateway inference-gateway |
| 101 | + NAME CLASS ADDRESS PROGRAMMED AGE |
| 102 | + inference-gateway inference-gateway <MY_ADDRESS> True 22s |
| 103 | + ``` |
103 | 104 |
|
104 |
| - For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. |
| 105 | + 1. Deploy Envoy Gateway Custom Policies |
| 106 | + |
| 107 | + ```bash |
| 108 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml |
| 109 | + ``` |
| 110 | + |
| 111 | + > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. |
| 112 | + |
| 113 | + 1. Apply Traffic Policy (Optional) |
| 114 | + |
| 115 | + For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. |
| 116 | + |
| 117 | + ```bash |
| 118 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml |
| 119 | + ``` |
| 120 | + |
| 121 | +=== "Kgateway" |
| 122 | + |
| 123 | + [Kgateway](https://kgateway.dev/) v2.0.0 adds support for inference extension as a **technical preview**. This means do not |
| 124 | + run Kgateway with inference extension in production environments. Refer to [Issue 10411](https://github.com/kgateway-dev/kgateway/issues/10411) |
| 125 | + for the list of caveats, supported features, etc. |
| 126 | + |
| 127 | + 1. Requirements |
| 128 | + |
| 129 | + - [Helm](https://helm.sh/docs/intro/install/) installed. |
| 130 | + - Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed. |
| 131 | + |
| 132 | + 2. Install Kgateway CRDs |
| 133 | + |
| 134 | + ```bash |
| 135 | + helm upgrade -i --create-namespace --namespace kgateway-system --version v2.0.0-main kgateway-crds https://github.com/danehans/toolbox/raw/refs/heads/main/charts/338661f3be-kgateway-crds-1.0.1-dev.tgz |
| 136 | + ``` |
| 137 | + |
| 138 | + 3. Install Kgateway |
| 139 | + |
| 140 | + ```bash |
| 141 | + helm upgrade --install kgateway "https://github.com/danehans/toolbox/raw/refs/heads/main/charts/338661f3be-kgateway-1.0.1-dev.tgz" \ |
| 142 | + -n kgateway-system \ |
| 143 | + --set image.registry=danehans \ |
| 144 | + --set image.pullPolicy=Always \ |
| 145 | + --set inferenceExtension.enabled="true" \ |
| 146 | + --version 1.0.1-dev |
| 147 | + ``` |
| 148 | + |
| 149 | + 4. Deploy Gateway and HTTPRoute resources |
| 150 | + |
| 151 | + ```bash |
| 152 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/resources.yaml |
| 153 | + ``` |
| 154 | + |
| 155 | + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
| 156 | + ```bash |
| 157 | + $ kubectl get gateway inference-gateway |
| 158 | + NAME CLASS ADDRESS PROGRAMMED AGE |
| 159 | + inference-gateway kgateway <MY_ADDRESS> True 22s |
| 160 | + ``` |
| 161 | + |
| 162 | +### Deploy the InferencePool and Extension |
105 | 163 |
|
106 | 164 | ```bash
|
107 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml |
| 165 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml |
108 | 166 | ```
|
109 | 167 |
|
110 | 168 | ### Try it out
|
|
0 commit comments