Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add helm template #416

Merged
merged 9 commits into from
Mar 19, 2025
Merged

add helm template #416

merged 9 commits into from
Mar 19, 2025

Conversation

Kuromesi
Copy link
Contributor

Resolve #381, deploy by helm.

A generated file is shown in config/manifests/gateway-api-inference-extension/generated.yaml.

To avoid conflicts with other releases, I extend the names of the resources with helm release name, which is shown in config/manifests/gateway-api-inference-extension/templates/_helpers.tpl.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @Kuromesi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 27, 2025
Copy link

netlify bot commented Feb 27, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c91ec3c
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67bfd39b64b78d0008de3d29
😎 Deploy Preview https://deploy-preview-416--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Feb 27, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit bf51f9a
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67da436206882b0008d9bce1
😎 Deploy Preview https://deploy-preview-416--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@robscott
Copy link
Member

Thanks @Kuromesi! I'll try to take a look at this today

/assign

@ahg-g
Copy link
Contributor

ahg-g commented Feb 27, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 27, 2025
@Kuromesi
Copy link
Contributor Author

Thanks @Kuromesi! I'll try to take a look at this today

/assign

Thanks, and I got some questions which not quite certain of:

  1. I made some effort on extending the names of the resources to avoid conflicting and made it possible to deploy under a single namespace, I'm not quite sure if that needed and the naming is appropriate.
  2. Do we need to support to configure entire setup parameters of the ext_proc in helm values?

@robscott
Copy link
Member

robscott commented Feb 28, 2025

Hey @Kuromesi, thanks for the work on this!

I think it would be helpful to think about how we expect users to use this project.

Initial Setup

Day to Day

  • Deploy InferencePool(s), each of which will be bundled with an Endpoint Picker
  • Configure InferenceModel(s) that will be served by an InferencePool
  • Configure HTTPRoute(s) to point to InferencePool(s)

While your PR seems to do a great job at capturing the config required for our current quickstart guide, it's not where we want to be long term. In the next ~month, I'm hopeful that we'll have built in support for this pattern from kgateway, Istio, and GKE Gateway implementations. That will mean that instead of manually patching Envoy Gateway like our current quickstart guide (and this Helm chart) do, users will be able to just use these APIs directly.

With that background, I think the original issue was specifically asking for a chart that "simplifies creating an InferencePool with an associated EPP deployment".

I think the ideal for this would be a chart that took parameters for InferencePool name, and then had defaults for all the rest, including the EPP configuration (Deployment, Service, HPA, RBAC). It looks like you have a lot of this in the chart already, but ideally the chart could be restructured to be focused exclusively on InferencePool and deploying a corresponding extension.

In the future we could expand this chart to include InferenceModels pointing at the InferencePool.

I'd recommend leaving all CRD, Gateway, and HTTPRoute configuration out of this chart. Hopefully that approach makes sense. I'm also happy to chat about this in the #gateway-api-inference-extension channel on Kubernetes Slack if that would be easier.

@Kuromesi
Copy link
Contributor Author

Hey @Kuromesi, thanks for the work on this!

I think it would be helpful to think about how we expect users to use this project.

Initial Setup

Day to Day

  • Deploy InferencePool(s), each of which will be bundled with an Endpoint Picker
  • Configure InferenceModel(s) that will be served by an InferencePool
  • Configure HTTPRoute(s) to point to InferencePool(s)

While your PR seems to do a great job at capturing the config required for our current quickstart guide, it's not where we want to be long term. In the next ~month, I'm hopeful that we'll have built in support for this pattern from kgateway, Istio, and GKE Gateway implementations. That will mean that instead of manually patching Envoy Gateway like our current quickstart guide (and this Helm chart) do, users will be able to just use these APIs directly.

With that background, I think the original issue was specifically asking for a chart that "simplifies creating an InferencePool with an associated EPP deployment".

I think the ideal for this would be a chart that took parameters for InferencePool name, and then had defaults for all the rest, including the EPP configuration (Deployment, Service, HPA, RBAC). It looks like you have a lot of this in the chart already, but ideally the chart could be restructured to be focused exclusively on InferencePool and deploying a corresponding extension.

In the future we could expand this chart to include InferenceModels pointing at the InferencePool.

I'd recommend leaving all CRD, Gateway, and HTTPRoute configuration out of this chart. Hopefully that approach makes sense. I'm also happy to chat about this in the #gateway-api-inference-extension channel on Kubernetes Slack if that would be easier.

Got it, thanks!

Signed-off-by: Kuromesi <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 28, 2025
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @Kuromesi! Left some more nits but otherwise LGTM

@@ -0,0 +1 @@
Gateway api inference extension deployed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gateway api inference extension deployed.
InferencePool deployed.

@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd expect this chart to live at config/charts/inferencepool

tag: main
pullPolicy: Always

name: inference-gateway-ext-proc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have the name of the inferencePool in it by default. So if the inference pools is called base, maybe this is called base-epp


inferencePool:
namespace: default
name: vllm-llama2-7b-pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what we want our default pool name to be, but this seems to specific. Maybe base or default?

cc @ahg-g @danehans @kfswain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool-1 :)

Comment on lines 62 to 65
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
spec:
selector:
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these feel like they should be included in values.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generated inhelpers.tpl, should we provide customization in values.yaml?

{{/*
Selector labels
*/}}
{{- define "gateway-api-inference-extension.selectorLabels" -}}
app: {{ .Values.inferenceExtension.name }}
{{- end -}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd missed that, thanks! While I don't think we need the InferencePool labels to be configurable, I think it's important to make the selector configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we should also create a inferencePool in the helm chart? (which I did not yet)

Comment on lines 69 to 70
port: {{ .Values.inferenceExtension.grpcPort | default 9002 }}
targetPort: {{ .Values.inferenceExtension.grpcPort | default 9002 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think I'd call this extProcPort. I also don't think you need to specify targetPort unless it's different from port.

Suggested change
port: {{ .Values.inferenceExtension.grpcPort | default 9002 }}
targetPort: {{ .Values.inferenceExtension.grpcPort | default 9002 }}
port: {{ .Values.inferenceExtension.extProcPort | default 9002 }}

selector:
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 4 }}
ports:
- name: grpc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: grpc
- name: ext_proc

@Kuromesi
Copy link
Contributor Author

Thanks for all the work on this @Kuromesi! Left some more nits but otherwise LGTM

Thanks for your patiently review! I will fix this issues.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2025

Thanks @Kuromesi for doing this, anything blocking this PR now?

Signed-off-by: Kuromesi <[email protected]>
@Kuromesi
Copy link
Contributor Author

Thanks @Kuromesi for doing this, anything blocking this PR now?

Sorry I did not render InferencePool in the helm template and I added now.

Signed-off-by: Kuromesi <[email protected]>
Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a README.md to config/charts to show how to use it and document the parameters please?

Signed-off-by: Kuromesi <[email protected]>
Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great.

@@ -0,0 +1,145 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please remove this for now, it doesn't match the config we have in patch_policy.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I don't think there's any way to remove this from Helm's generated output. We probably should have a follow up issue here that adds make commands to auto generate this + verify that this file matches the helm chart as part of a presubmit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I temporarily render the template in generated.yaml so that you can review the template easier, I will remove this. And I agree with @robscott, we should add make commands later.

--set inferencePool.selector.app=vllm-llama2-7b
```

Or you can change the `values.yaml` to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove this and just document the how we set them via command line flags.


| **Parameter Name** | **Description** |
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| `inferenceExtension.replicas` | Number of replicas for the inference extension service. Defaults to `1`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend listing the pool options first since they will be used more often than the epp.

@@ -0,0 +1,61 @@
# Gateway Api Inference Extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Gateway Api Inference Extension
# InferencePool

@@ -0,0 +1,61 @@
# Gateway Api Inference Extension

A chart to deploy the inference extension and a InferencePool managed by the extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A chart to deploy the inference extension and a InferencePool managed by the extension.
A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment.

Comment on lines 7 to 15
Suppose now a vllm service with label `app: vllm-llama2-7b` and served on port `8000` is deployed in `default` namespace in the cluster.

To deploy the inference extension, you can run the following command:

```txt
$ helm install my-release . -n default \
--set inferencePool.targetPortNumber=8000 \
--set inferencePool.selector.app=vllm-llama2-7b
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Suppose now a vllm service with label `app: vllm-llama2-7b` and served on port `8000` is deployed in `default` namespace in the cluster.
To deploy the inference extension, you can run the following command:
```txt
$ helm install my-release . -n default \
--set inferencePool.targetPortNumber=8000 \
--set inferencePool.selector.app=vllm-llama2-7b
```
To install an InferencePool named `pool-1` that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command:
```txt
$ helm install ./config/charts/inferencepool \
--set inferencePool.name=pool-1 \
--set inferencePool.selector.app=vllm-llama2-7b \
--set inferencePool.targetPortNumber=8000


Comment on lines 59 to 61
This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed.

For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This chart will only deploy the inference extension and InferencePool, before install the chart, please make sure that the inference extension CRDs have already been installed in the cluster. And You need to apply traffic policies to route traffic to the inference extension from the gateway after the inference extension is deployed.
For more details, please refer to the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).

@@ -0,0 +1 @@
Gateway api inference extension deployed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gateway api inference extension deployed.
InferencePool deployed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file needed at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really no, but we can use it to add more useful information in the future. Whatever we put here will print out after someone runs helm upgrade or helm install.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @Kuromesi! Agree with @ahg-g's feedback, and left a couple tiny comments myself, but otherwise LGTM.

name: pool-1
targetPortNumber: 8000
selector:
app: vllm-llama2-7b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget trailing new line here.

@@ -0,0 +1,145 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I don't think there's any way to remove this from Helm's generated output. We probably should have a follow up issue here that adds make commands to auto generate this + verify that this file matches the helm chart as part of a presubmit.

Signed-off-by: Kuromesi <[email protected]>
To install an InferencePool named `pool-1` that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command:

```txt
$ helm install my-release ./config/charts/inferencepool \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call it pool-1 instead of my-release pls

@ahg-g
Copy link
Contributor

ahg-g commented Mar 19, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, Kuromesi, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Mar 19, 2025

/retest

@ahg-g
Copy link
Contributor

ahg-g commented Mar 19, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit e9264f2 into kubernetes-sigs:main Mar 19, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add helm chart to simplify creating an InferencePool + EPP deployment
4 participants