add helm template #416

Kuromesi · 2025-02-27T02:53:12Z

Resolve #381, deploy by helm.

A generated file is shown in config/manifests/gateway-api-inference-extension/generated.yaml.

To avoid conflicts with other releases, I extend the names of the resources with helm release name, which is shown in config/manifests/gateway-api-inference-extension/templates/_helpers.tpl.

k8s-ci-robot · 2025-02-27T02:53:22Z

Hi @Kuromesi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Kuromesi <[email protected]>

netlify · 2025-02-27T02:54:01Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`c91ec3c`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67bfd39b64b78d0008de3d29
😎 Deploy Preview	https://deploy-preview-416--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify · 2025-02-27T02:55:51Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`bf51f9a`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67da436206882b0008d9bce1
😎 Deploy Preview	https://deploy-preview-416--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

robscott · 2025-02-27T20:55:16Z

Thanks @Kuromesi! I'll try to take a look at this today

/assign

ahg-g · 2025-02-27T21:22:04Z

/ok-to-test

Kuromesi · 2025-02-28T00:46:18Z

Thanks @Kuromesi! I'll try to take a look at this today

/assign

Thanks, and I got some questions which not quite certain of:

I made some effort on extending the names of the resources to avoid conflicting and made it possible to deploy under a single namespace, I'm not quite sure if that needed and the naming is appropriate.
Do we need to support to configure entire setup parameters of the ext_proc in helm values?

robscott · 2025-02-28T01:15:41Z

Hey @Kuromesi, thanks for the work on this!

I think it would be helpful to think about how we expect users to use this project.

Initial Setup

Install APIs (CRDs)
Install or enable a Gateway controller that supports this API
Set up an initial Gateway
Maybe set up body-to-header translator to enable routing based on model param in body (Add code for Envoy extension that supports body-to-header translation #355)

Day to Day

Deploy InferencePool(s), each of which will be bundled with an Endpoint Picker
Configure InferenceModel(s) that will be served by an InferencePool
Configure HTTPRoute(s) to point to InferencePool(s)

While your PR seems to do a great job at capturing the config required for our current quickstart guide, it's not where we want to be long term. In the next ~month, I'm hopeful that we'll have built in support for this pattern from kgateway, Istio, and GKE Gateway implementations. That will mean that instead of manually patching Envoy Gateway like our current quickstart guide (and this Helm chart) do, users will be able to just use these APIs directly.

With that background, I think the original issue was specifically asking for a chart that "simplifies creating an InferencePool with an associated EPP deployment".

I think the ideal for this would be a chart that took parameters for InferencePool name, and then had defaults for all the rest, including the EPP configuration (Deployment, Service, HPA, RBAC). It looks like you have a lot of this in the chart already, but ideally the chart could be restructured to be focused exclusively on InferencePool and deploying a corresponding extension.

In the future we could expand this chart to include InferenceModels pointing at the InferencePool.

I'd recommend leaving all CRD, Gateway, and HTTPRoute configuration out of this chart. Hopefully that approach makes sense. I'm also happy to chat about this in the #gateway-api-inference-extension channel on Kubernetes Slack if that would be easier.

Kuromesi · 2025-02-28T01:58:34Z

Hey @Kuromesi, thanks for the work on this!

I think it would be helpful to think about how we expect users to use this project.

Initial Setup

Install APIs (CRDs)

Install or enable a Gateway controller that supports this API

Set up an initial Gateway

Maybe set up body-to-header translator to enable routing based on model param in body ([WIP] Add code for Envoy extension that support body-to-header translation #355)

Day to Day

Deploy InferencePool(s), each of which will be bundled with an Endpoint Picker

Configure InferenceModel(s) that will be served by an InferencePool

Configure HTTPRoute(s) to point to InferencePool(s)

While your PR seems to do a great job at capturing the config required for our current quickstart guide, it's not where we want to be long term. In the next ~month, I'm hopeful that we'll have built in support for this pattern from kgateway, Istio, and GKE Gateway implementations. That will mean that instead of manually patching Envoy Gateway like our current quickstart guide (and this Helm chart) do, users will be able to just use these APIs directly.

With that background, I think the original issue was specifically asking for a chart that "simplifies creating an InferencePool with an associated EPP deployment".

I think the ideal for this would be a chart that took parameters for InferencePool name, and then had defaults for all the rest, including the EPP configuration (Deployment, Service, HPA, RBAC). It looks like you have a lot of this in the chart already, but ideally the chart could be restructured to be focused exclusively on InferencePool and deploying a corresponding extension.

In the future we could expand this chart to include InferenceModels pointing at the InferencePool.

I'd recommend leaving all CRD, Gateway, and HTTPRoute configuration out of this chart. Hopefully that approach makes sense. I'm also happy to chat about this in the #gateway-api-inference-extension channel on Kubernetes Slack if that would be easier.

Got it, thanks!

Signed-off-by: Kuromesi <[email protected]>

robscott

Thanks for all the work on this @Kuromesi! Left some more nits but otherwise LGTM

robscott · 2025-03-13T00:47:00Z

config/manifests/gateway-api-inference-extension/templates/NOTES.txt

@@ -0,0 +1 @@
+Gateway api inference extension deployed.


Suggested change

Gateway api inference extension deployed.

InferencePool deployed.

robscott · 2025-03-13T00:47:45Z

config/manifests/gateway-api-inference-extension/.helmignore

@@ -0,0 +1,23 @@
+# Patterns to ignore when building packages.


Nit: I'd expect this chart to live at config/charts/inferencepool

config/manifests/gateway-api-inference-extension/Chart.yaml

config/manifests/gateway-api-inference-extension/templates/_helpers.tpl

robscott · 2025-03-13T00:54:06Z

config/manifests/gateway-api-inference-extension/values.yaml

+    tag: main
+    pullPolicy: Always
+
+  name: inference-gateway-ext-proc


This should probably have the name of the inferencePool in it by default. So if the inference pools is called base, maybe this is called base-epp

robscott · 2025-03-13T00:55:06Z

config/manifests/gateway-api-inference-extension/values.yaml

+
+inferencePool:
+  namespace: default
+  name: vllm-llama2-7b-pool


I'm not really sure what we want our default pool name to be, but this seems to specific. Maybe base or default?

cc @ahg-g @danehans @kfswain

pool-1 :)

robscott · 2025-03-13T00:56:40Z

config/manifests/gateway-api-inference-extension/templates/ext_proc.yaml

+    {{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
+spec:
+  selector:
+    {{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }}


Both of these feel like they should be included in values.yaml

This is generated inhelpers.tpl, should we provide customization in values.yaml?

{{/* Selector labels */}} {{- define "gateway-api-inference-extension.selectorLabels" -}} app: {{ .Values.inferenceExtension.name }} {{- end -}}

I'd missed that, thanks! While I don't think we need the InferencePool labels to be configurable, I think it's important to make the selector configurable.

You mean we should also create a inferencePool in the helm chart? (which I did not yet)

robscott · 2025-03-13T00:58:00Z

config/manifests/gateway-api-inference-extension/templates/ext_proc.yaml

+      port: {{ .Values.inferenceExtension.grpcPort | default 9002 }}
+      targetPort: {{ .Values.inferenceExtension.grpcPort | default 9002 }}


Nit: I think I'd call this extProcPort. I also don't think you need to specify targetPort unless it's different from port.

Suggested change

port: {{ .Values.inferenceExtension.grpcPort | default 9002 }}

targetPort: {{ .Values.inferenceExtension.grpcPort | default 9002 }}

port: {{ .Values.inferenceExtension.extProcPort | default 9002 }}

robscott · 2025-03-13T00:58:23Z

config/manifests/gateway-api-inference-extension/templates/ext_proc.yaml

+  selector:
+    {{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }}
+  ports:
+    - name: grpc


Suggested change

- name: grpc

- name: ext_proc

Kuromesi · 2025-03-13T04:00:35Z

Thanks for all the work on this @Kuromesi! Left some more nits but otherwise LGTM

Thanks for your patiently review! I will fix this issues.

ahg-g · 2025-03-17T18:14:39Z

Thanks @Kuromesi for doing this, anything blocking this PR now?

config/manifests/gateway-api-inference-extension/templates/ext_proc.yaml

Signed-off-by: Kuromesi <[email protected]>

Kuromesi · 2025-03-18T01:02:18Z

Thanks @Kuromesi for doing this, anything blocking this PR now?

Sorry I did not render InferencePool in the helm template and I added now.

Signed-off-by: Kuromesi <[email protected]>

ahg-g

Can you please add a README.md to config/charts to show how to use it and document the parameters please?

Signed-off-by: Kuromesi <[email protected]>

ahg-g

Thanks, this is great.

ahg-g · 2025-03-19T01:38:53Z

config/manifests/generated.yaml

@@ -0,0 +1,145 @@
+---


can you please remove this for now, it doesn't match the config we have in patch_policy.yaml

I could be wrong, but I don't think there's any way to remove this from Helm's generated output. We probably should have a follow up issue here that adds make commands to auto generate this + verify that this file matches the helm chart as part of a presubmit.

Ah yes, I temporarily render the template in generated.yaml so that you can review the template easier, I will remove this. And I agree with @robscott, we should add make commands later.

ahg-g · 2025-03-19T01:43:23Z

config/charts/inferencepool/README.md

+    --set inferencePool.selector.app=vllm-llama2-7b
+```
+
+Or you can change the `values.yaml` to:


I suggest to remove this and just document the how we set them via command line flags.

ahg-g · 2025-03-19T01:43:58Z

config/charts/inferencepool/README.md

+
+| **Parameter Name**                          | **Description**                                                                                                   |
+|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
+| `inferenceExtension.replicas`               | Number of replicas for the inference extension service. Defaults to `1`.                                           |


I recommend listing the pool options first since they will be used more often than the epp.

ahg-g · 2025-03-19T01:44:36Z

config/charts/inferencepool/README.md

@@ -0,0 +1,61 @@
+# Gateway Api Inference Extension


Suggested change

# Gateway Api Inference Extension

# InferencePool

ahg-g · 2025-03-19T01:45:12Z

config/charts/inferencepool/README.md

@@ -0,0 +1,61 @@
+# Gateway Api Inference Extension
+
+A chart to deploy the inference extension and a InferencePool managed by the extension.


Suggested change

A chart to deploy the inference extension and a InferencePool managed by the extension.

A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment.

ahg-g · 2025-03-19T01:54:48Z

config/charts/inferencepool/README.md

+Suppose now a vllm service with label `app: vllm-llama2-7b` and served on port `8000` is deployed in `default` namespace in the cluster.
+
+To deploy the inference extension, you can run the following command:
+
+```txt
+$ helm install my-release . -n default \
+    --set inferencePool.targetPortNumber=8000 \
+    --set inferencePool.selector.app=vllm-llama2-7b
+```


Suggested change

Suppose now a vllm service with label `app: vllm-llama2-7b` and served on port `8000` is deployed in `default` namespace in the cluster.

To deploy the inference extension, you can run the following command:

```txt

$ helm install my-release . -n default \

--set inferencePool.targetPortNumber=8000 \

--set inferencePool.selector.app=vllm-llama2-7b

```

To install an InferencePool named `pool-1` that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command:

```txt

$ helm install ./config/charts/inferencepool \

--set inferencePool.name=pool-1 \

--set inferencePool.selector.app=vllm-llama2-7b \

--set inferencePool.targetPortNumber=8000

ahg-g · 2025-03-19T02:08:20Z

config/charts/inferencepool/README.md

+This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed.
+
+For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/).


Suggested change

This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed.

For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/).

This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).

ahg-g · 2025-03-19T02:08:41Z

config/charts/inferencepool/templates/NOTES.txt

@@ -0,0 +1 @@
+Gateway api inference extension deployed.


Suggested change

Gateway api inference extension deployed.

InferencePool deployed.

Is this file needed at all?

Not really no, but we can use it to add more useful information in the future. Whatever we put here will print out after someone runs helm upgrade or helm install.

robscott

This is great, thanks @Kuromesi! Agree with @ahg-g's feedback, and left a couple tiny comments myself, but otherwise LGTM.

robscott · 2025-03-19T02:20:46Z

config/charts/inferencepool/values.yaml

+  name: pool-1
+  targetPortNumber: 8000
+  selector:
+    app: vllm-llama2-7b


Don't forget trailing new line here.

robscott · 2025-03-19T02:22:07Z

config/manifests/generated.yaml

@@ -0,0 +1,145 @@
+---


I could be wrong, but I don't think there's any way to remove this from Helm's generated output. We probably should have a follow up issue here that adds make commands to auto generate this + verify that this file matches the helm chart as part of a presubmit.

Signed-off-by: Kuromesi <[email protected]>

ahg-g · 2025-03-19T03:18:23Z

config/charts/inferencepool/README.md

+To install an InferencePool named `pool-1`  that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command:
+
+```txt
+$ helm install my-release ./config/charts/inferencepool \


can we call it pool-1 instead of my-release pls

ahg-g · 2025-03-19T03:20:58Z

/approve

k8s-ci-robot · 2025-03-19T03:21:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, Kuromesi, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ahg-g · 2025-03-19T03:22:59Z

/retest

config/charts/inferencepool/README.md

ahg-g · 2025-03-19T04:09:19Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 27, 2025

k8s-ci-robot requested review from danehans and kfswain February 27, 2025 02:53

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 27, 2025

initialize helm template

4931640

Signed-off-by: Kuromesi <[email protected]>

Kuromesi force-pushed the helm branch from c91ec3c to 4931640 Compare February 27, 2025 02:54

k8s-ci-robot assigned robscott Feb 27, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 27, 2025

tidy template

2366460

Signed-off-by: Kuromesi <[email protected]>

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 28, 2025

robscott reviewed Mar 13, 2025

View reviewed changes

kfswain mentioned this pull request Mar 17, 2025

Revert name change to make pool name more descriptive. #516

Closed

ahg-g reviewed Mar 18, 2025

View reviewed changes

config/manifests/gateway-api-inference-extension/templates/ext_proc.yaml Outdated Show resolved Hide resolved

Kuromesi added 2 commits March 18, 2025 08:50

nit and add inference pool

dcd3bd5

Signed-off-by: Kuromesi <[email protected]>

relocate

154f670

Signed-off-by: Kuromesi <[email protected]>

fix

2490c28

Signed-off-by: Kuromesi <[email protected]>

fix

814bec3

ahg-g reviewed Mar 18, 2025

View reviewed changes

add readme

6712198

Signed-off-by: Kuromesi <[email protected]>

ahg-g reviewed Mar 19, 2025

View reviewed changes

robscott approved these changes Mar 19, 2025

View reviewed changes

nit

a885ea9

Signed-off-by: Kuromesi <[email protected]>

ahg-g reviewed Mar 19, 2025

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2025

ahg-g reviewed Mar 19, 2025

View reviewed changes

config/charts/inferencepool/README.md Outdated Show resolved Hide resolved

ahg-g reviewed Mar 19, 2025

View reviewed changes

config/charts/inferencepool/README.md Outdated Show resolved Hide resolved

Apply suggestions from code review

bf51f9a

k8s-ci-robot assigned ahg-g Mar 19, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2025

k8s-ci-robot merged commit e9264f2 into kubernetes-sigs:main Mar 19, 2025
7 of 8 checks passed

	Gateway api inference extension deployed.
	InferencePool deployed.

		@@ -0,0 +1,23 @@
		# Patterns to ignore when building packages.

		port: {{ .Values.inferenceExtension.grpcPort \| default 9002 }}
		targetPort: {{ .Values.inferenceExtension.grpcPort \| default 9002 }}

	port: {{ .Values.inferenceExtension.grpcPort \| default 9002 }}
	targetPort: {{ .Values.inferenceExtension.grpcPort \| default 9002 }}
	port: {{ .Values.inferenceExtension.extProcPort \| default 9002 }}

		@@ -0,0 +1,61 @@
		# Gateway Api Inference Extension

		A chart to deploy the inference extension and a InferencePool managed by the extension.

	A chart to deploy the inference extension and a InferencePool managed by the extension.
	A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment.

		This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed.

		For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/).

add helm template #416

add helm template #416

Conversation

Kuromesi commented Feb 27, 2025

k8s-ci-robot commented Feb 27, 2025

netlify bot commented Feb 27, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

netlify bot commented Feb 27, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

robscott commented Feb 27, 2025

ahg-g commented Feb 27, 2025

Kuromesi commented Feb 28, 2025

robscott commented Feb 28, 2025 • edited Loading

Kuromesi commented Feb 28, 2025

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kuromesi commented Mar 13, 2025

ahg-g commented Mar 17, 2025

Kuromesi commented Mar 18, 2025

ahg-g left a comment

Choose a reason for hiding this comment

ahg-g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Mar 19, 2025

k8s-ci-robot commented Mar 19, 2025

ahg-g commented Mar 19, 2025

ahg-g commented Mar 19, 2025

netlify bot commented Feb 27, 2025 •

edited

Loading

netlify bot commented Feb 27, 2025 •

edited

Loading

robscott commented Feb 28, 2025 •

edited

Loading