Skip to content

Commit c74c610

Browse files
committed
Update extension-policy to match the new epp service name
1 parent a591cd0 commit c74c610

File tree

4 files changed

+37
-37
lines changed

4 files changed

+37
-37
lines changed

config/manifests/gateway/extension_policy.yaml

-32
This file was deleted.

config/manifests/inferencemodel.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ spec:
66
modelName: tweet-summary
77
criticality: Critical
88
poolRef:
9-
name: my-pool
9+
name: vllm-llama2-7b
1010
targetModels:
1111
- name: tweet-summary-1
1212
weight: 100
@@ -20,7 +20,7 @@ spec:
2020
modelName: meta-llama/Llama-2-7b-hf
2121
criticality: Critical
2222
poolRef:
23-
name: my-pool
23+
name: vllm-llama2-7b
2424

2525
---
2626
apiVersion: inference.networking.x-k8s.io/v1alpha2
@@ -31,4 +31,4 @@ spec:
3131
modelName: Qwen/Qwen2.5-1.5B-Instruct
3232
criticality: Critical
3333
poolRef:
34-
name: my-pool
34+
name: vllm-llama2-7b

config/manifests/inferencepool.yaml

+33
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,39 @@ spec:
7575
initialDelaySeconds: 5
7676
periodSeconds: 10
7777
---
78+
apiVersion: gateway.envoyproxy.io/v1alpha1
79+
kind: EnvoyExtensionPolicy
80+
metadata:
81+
name: ext-proc-policy
82+
namespace: default
83+
spec:
84+
extProc:
85+
- backendRefs:
86+
- group: ""
87+
kind: Service
88+
name: vllm-llama2-7b-epp
89+
port: 9002
90+
processingMode:
91+
allowModeOverride: true
92+
request:
93+
body: Buffered
94+
response:
95+
# The timeouts are likely not needed here. We can experiment with removing/tuning them slowly.
96+
# The connection limits are more important and will cause the opaque: ext_proc_gRPC_error_14 error in Envoy GW if not configured correctly.
97+
messageTimeout: 1000s
98+
backendSettings:
99+
circuitBreaker:
100+
maxConnections: 40000
101+
maxPendingRequests: 40000
102+
maxParallelRequests: 40000
103+
timeout:
104+
tcp:
105+
connectTimeout: 24h
106+
targetRef:
107+
group: gateway.networking.k8s.io
108+
kind: HTTPRoute
109+
name: llm-route
110+
---
78111
kind: ClusterRole
79112
apiVersion: rbac.authorization.k8s.io/v1
80113
metadata:

site-src/guides/index.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,6 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
8888
### Deploy Envoy Gateway Custom Policies
8989

9090
```bash
91-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/extension_policy.yaml
9291
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml
9392
```
9493
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
@@ -125,7 +124,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
125124
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml --ignore-not-found
126125
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/extension_policy.yaml --ignore-not-found
127126
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml --ignore-not-found
128-
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/ext_proc.yaml --ignore-not-found
127+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml --ignore-not-found
129128
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml --ignore-not-found
130129
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml --ignore-not-found
131130
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml --ignore-not-found

0 commit comments

Comments
 (0)