Skip to content

Commit 5d7deea

Browse files
committed
Docs: Updates getting started guide for kgateway
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent 731f244 commit 5d7deea

File tree

2 files changed

+131
-33
lines changed

2 files changed

+131
-33
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Requires Kgateway 2.0.0 or greater.
2+
---
3+
apiVersion: gateway.networking.k8s.io/v1
4+
kind: Gateway
5+
metadata:
6+
name: inference-gateway
7+
spec:
8+
gatewayClassName: kgateway
9+
listeners:
10+
- name: http
11+
protocol: HTTP
12+
port: 8080
13+
- name: llm-gw
14+
protocol: HTTP
15+
port: 8081
16+
---
17+
apiVersion: gateway.networking.k8s.io/v1
18+
kind: HTTPRoute
19+
metadata:
20+
name: llm-route
21+
spec:
22+
parentRefs:
23+
- group: gateway.networking.k8s.io
24+
kind: Gateway
25+
name: inference-gateway
26+
sectionName: llm-gw
27+
rules:
28+
- backendRefs:
29+
- group: inference.networking.x-k8s.io
30+
kind: InferencePool
31+
name: vllm-llama2-7b
32+
port: 8000
33+
weight: 1
34+
matches:
35+
- path:
36+
type: PathPrefix
37+
value: /
38+
timeouts:
39+
backendRequest: 24h
40+
request: 24h

site-src/guides/index.md

+91-33
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
44

55
## **Prerequisites**
6-
- Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
76
- A cluster with:
87
- Support for services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running).
98
For example, with Kind, you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
@@ -56,55 +55,114 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
5655
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
5756
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
5857
```
59-
58+
6059
### Deploy InferenceModel
6160

6261
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
6362
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
63+
6464
```bash
6565
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml
6666
```
6767

68-
### Update Envoy Gateway Config to enable Patch Policy**
68+
### Deploy Inference Gateway
6969

70-
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
71-
```bash
72-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml
73-
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
74-
```
75-
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
70+
Choose one of the following options to deploy an Inference Gateway.
7671

77-
### Deploy Gateway
72+
=== "Envoy Gateway"
7873

79-
```bash
80-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml
81-
```
82-
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
74+
1. Requirements
8375

84-
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
85-
```bash
86-
$ kubectl get gateway inference-gateway
87-
NAME CLASS ADDRESS PROGRAMMED AGE
88-
inference-gateway inference-gateway <MY_ADDRESS> True 22s
89-
```
90-
### Deploy the InferencePool and Extension
76+
- Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
9177

92-
```bash
93-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml
94-
```
95-
### Deploy Envoy Gateway Custom Policies
78+
1. Update Envoy Gateway Config to enable Patch Policy
9679

97-
```bash
98-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml
99-
```
100-
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
101-
102-
### **OPTIONALLY**: Apply Traffic Policy
80+
Our custom LLM Gateway ext-proc is patched into the existing Envoy Gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the
81+
Envoy Gateway config map. To do this, apply the following manifest and restart Envoy Gateway:
82+
83+
```bash
84+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml
85+
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
86+
```
87+
88+
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
89+
90+
1. Deploy GatewayClass, Gateway, Backend, and HTTPRoute resources
91+
92+
```bash
93+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml
94+
```
95+
96+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
97+
98+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
99+
```bash
100+
$ kubectl get gateway inference-gateway
101+
NAME CLASS ADDRESS PROGRAMMED AGE
102+
inference-gateway inference-gateway <MY_ADDRESS> True 22s
103+
```
103104

104-
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
105+
1. Deploy Envoy Gateway Custom Policies
106+
107+
```bash
108+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml
109+
```
110+
111+
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
112+
113+
1. Apply Traffic Policy (Optional)
114+
115+
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
116+
117+
```bash
118+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml
119+
```
120+
121+
=== "Kgateway"
122+
123+
[Kgateway](https://kgateway.dev/) v2.0.0 adds support for inference extension as a **technical preview**. This means do not
124+
run Kgateway with inference extension in production environments. Refer to [Issue 10411](https://github.com/kgateway-dev/kgateway/issues/10411)
125+
for the list of caveats, supported features, etc.
126+
127+
1. Requirements
128+
129+
- [Helm](https://helm.sh/docs/intro/install/) installed.
130+
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
131+
132+
2. Install Kgateway CRDs
133+
134+
```bash
135+
helm upgrade -i --create-namespace --namespace kgateway-system --version v2.0.0-main kgateway-crds https://github.com/danehans/toolbox/raw/refs/heads/main/charts/338661f3be-kgateway-crds-1.0.1-dev.tgz
136+
```
137+
138+
3. Install Kgateway
139+
140+
```bash
141+
helm upgrade --install kgateway "https://github.com/danehans/toolbox/raw/refs/heads/main/charts/338661f3be-kgateway-1.0.1-dev.tgz" \
142+
-n kgateway-system \
143+
--set image.registry=danehans \
144+
--set image.pullPolicy=Always \
145+
--set inferenceExtension.enabled="true" \
146+
--version 1.0.1-dev
147+
```
148+
149+
4. Deploy Gateway and HTTPRoute resources
150+
151+
```bash
152+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/resources.yaml
153+
```
154+
155+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
156+
```bash
157+
$ kubectl get gateway inference-gateway
158+
NAME CLASS ADDRESS PROGRAMMED AGE
159+
inference-gateway kgateway <MY_ADDRESS> True 22s
160+
```
161+
162+
### Deploy the InferencePool and Extension
105163

106164
```bash
107-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml
165+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml
108166
```
109167

110168
### Try it out

0 commit comments

Comments
 (0)