-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add helm template #416
add helm template #416
Changes from 7 commits
4931640
2366460
dcd3bd5
154f670
2490c28
814bec3
6712198
a885ea9
bf51f9a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
apiVersion: v2 | ||
name: InferencePool | ||
description: A Helm chart for InferencePool | ||
|
||
type: application | ||
|
||
version: 0.1.0 | ||
|
||
appVersion: "0.2.0" |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,61 @@ | ||||||||||||||||||||||||||||||||||
# Gateway Api Inference Extension | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
A chart to deploy the inference extension and a InferencePool managed by the extension. | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
## Install | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Suppose now a vllm service with label `app: vllm-llama2-7b` and served on port `8000` is deployed in `default` namespace in the cluster. | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
To deploy the inference extension, you can run the following command: | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
```txt | ||||||||||||||||||||||||||||||||||
$ helm install my-release . -n default \ | ||||||||||||||||||||||||||||||||||
--set inferencePool.targetPortNumber=8000 \ | ||||||||||||||||||||||||||||||||||
--set inferencePool.selector.app=vllm-llama2-7b | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Or you can change the `values.yaml` to: | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to remove this and just document the how we set them via command line flags. |
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
```yaml | ||||||||||||||||||||||||||||||||||
inferencePool: | ||||||||||||||||||||||||||||||||||
name: pool-1 | ||||||||||||||||||||||||||||||||||
targetPortNumber: 8000 | ||||||||||||||||||||||||||||||||||
selector: | ||||||||||||||||||||||||||||||||||
app: vllm-llama2-7b | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
where `inferencePool.targetPortNumber` is the pod that vllm backends served on and `inferencePool.selector` is the selector to match the vllm backends. And then run: | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
```txt | ||||||||||||||||||||||||||||||||||
$ helm install my-release . | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
## Uninstall | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Run the following command to uninstall the chart: | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
```txt | ||||||||||||||||||||||||||||||||||
$ helm uninstall my-release | ||||||||||||||||||||||||||||||||||
ahg-g marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
## Configuration | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
The following table list the configurable parameters of the chart. | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
| **Parameter Name** | **Description** | | ||||||||||||||||||||||||||||||||||
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------| | ||||||||||||||||||||||||||||||||||
| `inferenceExtension.replicas` | Number of replicas for the inference extension service. Defaults to `1`. | | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recommend listing the pool options first since they will be used more often than the epp. |
||||||||||||||||||||||||||||||||||
| `inferenceExtension.image.name` | Name of the container image used for the inference extension. | | ||||||||||||||||||||||||||||||||||
| `inferenceExtension.image.hub` | Registry URL where the inference extension image is hosted. | | ||||||||||||||||||||||||||||||||||
| `inferenceExtension.image.tag` | Image tag of the inference extension. | | ||||||||||||||||||||||||||||||||||
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. | | ||||||||||||||||||||||||||||||||||
| `inferenceExtension.extProcPort` | Port where the inference extension service is served for external processing. Defaults to `9002`. | | ||||||||||||||||||||||||||||||||||
| `inferencePool.name` | Name for the InferencePool, and inference extension will be named as `${inferencePool.name}-epp`. | | ||||||||||||||||||||||||||||||||||
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. | | ||||||||||||||||||||||||||||||||||
| `inferencePool.selector` | Label selector to match vllm backends managed by the inference pool. | | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
## Notes | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed. | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/). | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1 @@ | ||||||
Gateway api inference extension deployed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this file needed at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really no, but we can use it to add more useful information in the future. Whatever we put here will print out after someone runs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{{/* | ||
Common labels | ||
*/}} | ||
{{- define "gateway-api-inference-extension.labels" -}} | ||
app.kubernetes.io/name: {{ include "gateway-api-inference-extension.name" . }} | ||
{{- if .Chart.AppVersion }} | ||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} | ||
{{- end }} | ||
{{- end }} | ||
|
||
{{/* | ||
Inference extension name | ||
*/}} | ||
{{- define "gateway-api-inference-extension.name" -}} | ||
{{- $base := .Values.inferencePool.name | default "default-pool" | lower | trim | trunc 40 -}} | ||
{{ $base }}-epp | ||
{{- end -}} | ||
|
||
{{/* | ||
Selector labels | ||
*/}} | ||
{{- define "gateway-api-inference-extension.selectorLabels" -}} | ||
app: {{ include "gateway-api-inference-extension.name" . }} | ||
{{- end -}} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferencePool | ||
metadata: | ||
name: {{ .Values.inferencePool.name }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }} | ||
spec: | ||
targetPortNumber: {{ .Values.inferencePool.targetPortNumber }} | ||
selector: | ||
{{- range $key, $value := .Values.inferencePool.selector }} | ||
{{ $key }}: {{ quote $value }} | ||
{{- end }} | ||
extensionRef: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }} | ||
spec: | ||
replicas: {{ .Values.inferenceExtension.replicas | default 1 }} | ||
selector: | ||
matchLabels: | ||
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 6 }} | ||
template: | ||
metadata: | ||
labels: | ||
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }} | ||
spec: | ||
serviceAccountName: {{ include "gateway-api-inference-extension.name" . }} | ||
containers: | ||
- name: epp | ||
image: {{ .Values.inferenceExtension.image.hub }}/{{ .Values.inferenceExtension.image.name }}:{{ .Values.inferenceExtension.image.tag }} | ||
imagePullPolicy: {{ .Values.inferenceExtension.image.pullPolicy | default "Always" }} | ||
args: | ||
- -poolName | ||
- {{ .Values.inferencePool.name }} | ||
- -poolNamespace | ||
- {{ .Release.Namespace }} | ||
- -v | ||
- "3" | ||
- -grpcPort | ||
- "9002" | ||
- -grpcHealthPort | ||
- "9003" | ||
- -metricsPort | ||
- "9090" | ||
ports: | ||
- name: grpc | ||
containerPort: 9002 | ||
- name: grpc-health | ||
containerPort: 9003 | ||
- name: metrics | ||
containerPort: 9090 | ||
livenessProbe: | ||
grpc: | ||
port: 9003 | ||
service: inference-extension | ||
initialDelaySeconds: 5 | ||
periodSeconds: 10 | ||
readinessProbe: | ||
grpc: | ||
port: 9003 | ||
service: inference-extension | ||
initialDelaySeconds: 5 | ||
periodSeconds: 10 | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }} | ||
spec: | ||
selector: | ||
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }} | ||
ports: | ||
- name: grpc-ext-proc | ||
protocol: TCP | ||
port: {{ .Values.inferenceExtension.extProcPort | default 9002 }} | ||
- name: http-metrics | ||
protocol: TCP | ||
port: {{ .Values.inferenceExtension.metricsPort | default 9090 }} | ||
type: ClusterIP |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
kind: ClusterRole | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
metadata: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
labels: | ||
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }} | ||
rules: | ||
- apiGroups: ["inference.networking.x-k8s.io"] | ||
resources: ["inferencemodels, inferencepools"] | ||
verbs: ["get", "watch", "list"] | ||
- apiGroups: [""] | ||
resources: ["pods"] | ||
verbs: ["get", "watch", "list"] | ||
- apiGroups: | ||
- authentication.k8s.io | ||
resources: | ||
- tokenreviews | ||
verbs: | ||
- create | ||
- apiGroups: | ||
- authorization.k8s.io | ||
resources: | ||
- subjectaccessreviews | ||
verbs: | ||
- create | ||
--- | ||
kind: ClusterRoleBinding | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
metadata: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
subjects: | ||
- kind: ServiceAccount | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
namespace: {{ .Release.Namespace }} | ||
roleRef: | ||
kind: ClusterRole | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
--- | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
name: {{ include "gateway-api-inference-extension.name" . }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
inferenceExtension: | ||
replicas: 1 | ||
image: | ||
name: epp | ||
hub: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension | ||
tag: main | ||
pullPolicy: Always | ||
extProcPort: 9002 | ||
|
||
inferencePool: | ||
name: pool-1 | ||
targetPortNumber: 8000 | ||
selector: | ||
app: vllm-llama2-7b | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't forget trailing new line here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.