Skip to content

Commit e9264f2

Browse files
Kuromesiahg-g
andauthored
add helm template (#416)
* initialize helm template Signed-off-by: Kuromesi <[email protected]> * tidy template Signed-off-by: Kuromesi <[email protected]> * nit and add inference pool Signed-off-by: Kuromesi <[email protected]> * relocate Signed-off-by: Kuromesi <[email protected]> * fix Signed-off-by: Kuromesi <[email protected]> * fix * add readme Signed-off-by: Kuromesi <[email protected]> * nit Signed-off-by: Kuromesi <[email protected]> * Apply suggestions from code review --------- Signed-off-by: Kuromesi <[email protected]> Co-authored-by: Abdullah Gharaibeh <[email protected]>
1 parent 64ba0c6 commit e9264f2

File tree

8 files changed

+250
-0
lines changed

8 files changed

+250
-0
lines changed
+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
.bzr/
9+
.bzrignore
10+
.hg/
11+
.hgignore
12+
.svn/
13+
# Common backup files
14+
*.swp
15+
*.bak
16+
*.tmp
17+
*.orig
18+
*~
19+
# Various IDEs
20+
.project
21+
.idea/
22+
*.tmproj
23+
.vscode/
+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
apiVersion: v2
2+
name: InferencePool
3+
description: A Helm chart for InferencePool
4+
5+
type: application
6+
7+
version: 0.1.0
8+
9+
appVersion: "0.2.0"

config/charts/inferencepool/README.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# InferencePool
2+
3+
A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment.
4+
5+
6+
## Install
7+
8+
To install an InferencePool named `pool-1` that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command:
9+
10+
```txt
11+
$ helm install pool-1 ./config/charts/inferencepool \
12+
--set inferencePool.name=pool-1 \
13+
--set inferencePool.selector.app=vllm-llama2-7b \
14+
--set inferencePool.targetPortNumber=8000
15+
```
16+
17+
where `inferencePool.targetPortNumber` is the pod that vllm backends served on and `inferencePool.selector` is the selector to match the vllm backends.
18+
19+
## Uninstall
20+
21+
Run the following command to uninstall the chart:
22+
23+
```txt
24+
$ helm uninstall pool-1
25+
```
26+
27+
## Configuration
28+
29+
The following table list the configurable parameters of the chart.
30+
31+
| **Parameter Name** | **Description** |
32+
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
33+
| `inferencePool.name` | Name for the InferencePool, and inference extension will be named as `${inferencePool.name}-epp`. |
34+
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. |
35+
| `inferencePool.selector` | Label selector to match vllm backends managed by the inference pool. |
36+
| `inferenceExtension.replicas` | Number of replicas for the inference extension service. Defaults to `1`. |
37+
| `inferenceExtension.image.name` | Name of the container image used for the inference extension. |
38+
| `inferenceExtension.image.hub` | Registry URL where the inference extension image is hosted. |
39+
| `inferenceExtension.image.tag` | Image tag of the inference extension. |
40+
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
41+
| `inferenceExtension.extProcPort` | Port where the inference extension service is served for external processing. Defaults to `9002`. |
42+
43+
## Notes
44+
45+
This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
InferencePool {{ .Values.inferencePool.name }} deployed.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{{/*
2+
Common labels
3+
*/}}
4+
{{- define "gateway-api-inference-extension.labels" -}}
5+
app.kubernetes.io/name: {{ include "gateway-api-inference-extension.name" . }}
6+
{{- if .Chart.AppVersion }}
7+
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
8+
{{- end }}
9+
{{- end }}
10+
11+
{{/*
12+
Inference extension name
13+
*/}}
14+
{{- define "gateway-api-inference-extension.name" -}}
15+
{{- $base := .Values.inferencePool.name | default "default-pool" | lower | trim | trunc 40 -}}
16+
{{ $base }}-epp
17+
{{- end -}}
18+
19+
{{/*
20+
Selector labels
21+
*/}}
22+
{{- define "gateway-api-inference-extension.selectorLabels" -}}
23+
app: {{ include "gateway-api-inference-extension.name" . }}
24+
{{- end -}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
apiVersion: inference.networking.x-k8s.io/v1alpha2
2+
kind: InferencePool
3+
metadata:
4+
name: {{ .Values.inferencePool.name }}
5+
namespace: {{ .Release.Namespace }}
6+
labels:
7+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
8+
spec:
9+
targetPortNumber: {{ .Values.inferencePool.targetPortNumber }}
10+
selector:
11+
{{- range $key, $value := .Values.inferencePool.selector }}
12+
{{ $key }}: {{ quote $value }}
13+
{{- end }}
14+
extensionRef:
15+
name: {{ include "gateway-api-inference-extension.name" . }}
16+
---
17+
apiVersion: apps/v1
18+
kind: Deployment
19+
metadata:
20+
name: {{ include "gateway-api-inference-extension.name" . }}
21+
namespace: {{ .Release.Namespace }}
22+
labels:
23+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
24+
spec:
25+
replicas: {{ .Values.inferenceExtension.replicas | default 1 }}
26+
selector:
27+
matchLabels:
28+
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 6 }}
29+
template:
30+
metadata:
31+
labels:
32+
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}
33+
spec:
34+
serviceAccountName: {{ include "gateway-api-inference-extension.name" . }}
35+
containers:
36+
- name: epp
37+
image: {{ .Values.inferenceExtension.image.hub }}/{{ .Values.inferenceExtension.image.name }}:{{ .Values.inferenceExtension.image.tag }}
38+
imagePullPolicy: {{ .Values.inferenceExtension.image.pullPolicy | default "Always" }}
39+
args:
40+
- -poolName
41+
- {{ .Values.inferencePool.name }}
42+
- -poolNamespace
43+
- {{ .Release.Namespace }}
44+
- -v
45+
- "3"
46+
- -grpcPort
47+
- "9002"
48+
- -grpcHealthPort
49+
- "9003"
50+
- -metricsPort
51+
- "9090"
52+
ports:
53+
- name: grpc
54+
containerPort: 9002
55+
- name: grpc-health
56+
containerPort: 9003
57+
- name: metrics
58+
containerPort: 9090
59+
livenessProbe:
60+
grpc:
61+
port: 9003
62+
service: inference-extension
63+
initialDelaySeconds: 5
64+
periodSeconds: 10
65+
readinessProbe:
66+
grpc:
67+
port: 9003
68+
service: inference-extension
69+
initialDelaySeconds: 5
70+
periodSeconds: 10
71+
---
72+
apiVersion: v1
73+
kind: Service
74+
metadata:
75+
name: {{ include "gateway-api-inference-extension.name" . }}
76+
namespace: {{ .Release.Namespace }}
77+
labels:
78+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
79+
spec:
80+
selector:
81+
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }}
82+
ports:
83+
- name: grpc-ext-proc
84+
protocol: TCP
85+
port: {{ .Values.inferenceExtension.extProcPort | default 9002 }}
86+
- name: http-metrics
87+
protocol: TCP
88+
port: {{ .Values.inferenceExtension.metricsPort | default 9090 }}
89+
type: ClusterIP
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
kind: ClusterRole
2+
apiVersion: rbac.authorization.k8s.io/v1
3+
metadata:
4+
name: {{ include "gateway-api-inference-extension.name" . }}
5+
labels:
6+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
7+
rules:
8+
- apiGroups: ["inference.networking.x-k8s.io"]
9+
resources: ["inferencemodels, inferencepools"]
10+
verbs: ["get", "watch", "list"]
11+
- apiGroups: [""]
12+
resources: ["pods"]
13+
verbs: ["get", "watch", "list"]
14+
- apiGroups:
15+
- authentication.k8s.io
16+
resources:
17+
- tokenreviews
18+
verbs:
19+
- create
20+
- apiGroups:
21+
- authorization.k8s.io
22+
resources:
23+
- subjectaccessreviews
24+
verbs:
25+
- create
26+
---
27+
kind: ClusterRoleBinding
28+
apiVersion: rbac.authorization.k8s.io/v1
29+
metadata:
30+
name: {{ include "gateway-api-inference-extension.name" . }}
31+
subjects:
32+
- kind: ServiceAccount
33+
name: {{ include "gateway-api-inference-extension.name" . }}
34+
namespace: {{ .Release.Namespace }}
35+
roleRef:
36+
kind: ClusterRole
37+
name: {{ include "gateway-api-inference-extension.name" . }}
38+
---
39+
apiVersion: v1
40+
kind: ServiceAccount
41+
metadata:
42+
name: {{ include "gateway-api-inference-extension.name" . }}
43+
namespace: {{ .Release.Namespace }}
44+
labels:
45+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
inferenceExtension:
2+
replicas: 1
3+
image:
4+
name: epp
5+
hub: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension
6+
tag: main
7+
pullPolicy: Always
8+
extProcPort: 9002
9+
10+
inferencePool:
11+
name: pool-1
12+
targetPortNumber: 8000
13+
selector:
14+
app: vllm-llama2-7b

0 commit comments

Comments
 (0)