Skip to content

CRDs upgrade support #4001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shahar-h opened this issue Aug 5, 2024 · 30 comments · Fixed by #5616
Closed

CRDs upgrade support #4001

shahar-h opened this issue Aug 5, 2024 · 30 comments · Fixed by #5616
Assignees
Labels
area/installation area/upgrade documentation Improvements or additions to documentation
Milestone

Comments

@shahar-h
Copy link
Contributor

shahar-h commented Aug 5, 2024

Description:
Currently Gateway API and EG CRDs are located in EG gateway-helm chart crds folder. This means that CRDs are not upgraded on chart upgrade, and need to be manually upgraded beforehand.

Possible solutions:

  1. Move crds inside templates.
    Real world example: cert-manager chart has CRDs as part of templates folder and exposes crds.enabled flag.
    It also allows to decide if we want to keep the CRDs on chart uninstallation with crds.keep flag, by leveraging "helm.sh/resource-policy": keep helm annotation.
    See: https://cert-manager.io/v1.13-docs/installation/helm/#crd-considerations.
    However, this is a breaking change for consumers that use EG chart as a sub chart and have custom resources as part of the main chart.
  2. Split the current chart into crds chart and application chart, and instruct to install the crds chart first.
    Real world example: Istio provides a base chart that installs CRDs: https://istio.io/latest/docs/setup/install/helm/#installation-steps.

Both 1 and 2 can also solve #3094 by providing a separate flags(1)/charts(2) for Gateway API and EG CRDs.

Also Related:

WDYT?

@shahar-h
Copy link
Contributor Author

shahar-h commented Sep 4, 2024

While trying to split the chart into 2 charts(app chart and CRDs chart) internally CRDs chart installation failed because helm release secret exceeded max 1MB size limit. In order to resolve that issue I split the CRDs chart into 2 more charts - eg CRDS and gateway api CRDs.

@arkodg
Copy link
Contributor

arkodg commented Sep 11, 2024

thinking out loud, we could add two additional templates (w/o deleting the /crds dir) for the CRDs - one for gateway-api and one for envoy-gateway and use knobs such as applyCrds.all applyCrds.gateway-api and applyCrds.envoy-gateway

This solves 2 use cases

  • For cases where the cloud provider like GKE installs the gateway-api resources, a user could install EG with helm install --skip-crds --set applyCrds.envoy-gateway=true to selectively install EG CRDs
  • For upgrade cases , users could run helm upgrade --set applyCrds.all=true to update all CRDs or with --set applyCrds.envoy-gateway=true to only update EG CRDs

the naming for the helm knobs can be improved, of course

@shahar-h
Copy link
Contributor Author

shahar-h commented Sep 17, 2024

@arkodg installing both EG and gateway-api CRDs from the same chart won't work since helm release secret will exceed 1MB limit as I mentioned here.
In addition, installing CRDs from EG chart templates would not work for umbrella charts that contain some CRD instance.

What about keeping the crds folder and create 2 additional CRDs charts?

@arkodg arkodg added the kind/decision A record of a decision made by the community. label Sep 17, 2024
@arkodg
Copy link
Contributor

arkodg commented Sep 17, 2024

ah thanks for trying it out @shahar-h .

Looks like we have a decision to make, neither being ideal

  1. Move CRDs from /crds to templates/ with an opt in / opt out approach similar to what cert-manager does (opt in)
    Pros
  • Users - Makes sure versions of CRD and controller are same, so there's implicit order and support during version upgrades
  • Maintainers - One less chart to maintain
  1. Add another gateway-crds-helm chart similar to what linked does https://linkerd.io/2-edge/tasks/install-helm/#linkerd-crds
    Pros
  • Explicit split up of responsibilities b/w cluster admin (installing CRDs) and platform admin (installing and running EG)
    Cons
  • Users need to ensure they are installing compatible/same versions

Common issues

  • Users may forget to opt in or opt out of installing CRDs during install and upgrade, which is probably also true for a gateway-crds-helm chart where users may forget to install it.

@arkodg
Copy link
Contributor

arkodg commented Sep 17, 2024

argo also puts crds in templates

@arkodg
Copy link
Contributor

arkodg commented Sep 20, 2024

@shahar-h can you help investigate who is the consuming most of the space ?

@shahar-h
Copy link
Contributor Author

@shahar-h can you help investigate who is the consuming most of the space ?

Sure, will do.

@arkodg
Copy link
Contributor

arkodg commented Sep 20, 2024

thanks ! I did a little digging, one way to reduce size could be to rm the API descriptions, but that would break kubectl explain

@shahar-h
Copy link
Contributor Author

shahar-h commented Sep 23, 2024

I computed eg CRDS & gwapi CRDs size by creating separate charts and installing each chart. Then I computed helm secret release size for each one of them with the following command:

kubectl -n envoy-gateway-system get secret <helm-release-secret-name> -o jsonpath='{.data.release}' | base64 -d | wc -c 

Results:

  • gwapi crds chart release secret size - 494132 Bytes (0.49MB)
  • eg crds chart release secret size - 685360 Bytes (0.68MB)

Total: 1.17MB > 1MB size limit

Note that the helm release secret size is lower that real CRDs size since helm uses gzip compression.

BTW there is an open PR to split helm releases into multiple k8s secrets.

@arkodg
Copy link
Contributor

arkodg commented Sep 23, 2024

thanks for analyzing this !
the linked PR and the approach of using multiple k8s secrets looks promising, we could wait it out until that feature goes in and then move this the CRDs into templates

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Nov 15, 2024
@aukevanleeuwen
Copy link

I'm not sure if I should open a new issue or add something here, but let's start by doing it here.

A challenge I have with Envoy Gateway is also CRDs related. I'm using cert-manager with Gateway API support enabled to issue certificates for Envoy (https://gateway.envoyproxy.io/docs/tasks/security/tls-cert-manager/). Works beautifully, however today we ran into the issue that cert-manager with this Gateway API support enabled depends on the Kubernetes Gateway API CRDs. I can't recall the details (since it's been a few weeks ago that I looked at it for a different reason), but essentially I think it was trying to subscribe to any changes on the gateway.networking.k8s.io/v1:Gateway resources. I was installing cert-manager before envoy-gateway because that seemed the obvious order, but that didn't work because of this dependency.

A few weeks ago the obvious choice was to reverse that order, but today this ordering popped up again and we really want to reverse that order.

Okay, reason why I have this comment here is that it seems strange to me that the CRDs of the Kubernetes Gateway API are installed directly in this chart. It seems to me that the Kubernetes Gateway API CRDs are an 'interface' of which Envoy Gateway is an implementation, right? In that sense I would rather have this split as it would (probably?) allow me to install the Kubernetes Gateway API CRDs separately, then install cert-manager and then install the 'implementation': Envoy Gateway.

I hope that makes sense.

It would maybe also be a possibilty to have the Kubernets Gateway API as a conditional subchart of envoy-gateway-helm like kube-prometheus-stack is doing.

This could also be it's own issue, let me know.

@github-actions github-actions bot removed the stale label Feb 7, 2025
Copy link

github-actions bot commented Mar 9, 2025

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@bozho
Copy link

bozho commented Mar 25, 2025

Just to add a 👍 for @aukevanleeuwen's comment. Gateway API CRDs should be treated as an external dependency. While it's nice as a convenience to have them installed alongside Envoy Gateway, users should be able to install Gateway API CRDs separately.

This would be useful in, for example, GitOps, where it's nice to be able to enforce the general deployment order: CRDs -> controllers -> configs.

@github-actions github-actions bot removed the stale label Mar 25, 2025
@NissesSenap
Copy link
Contributor

Got the same issue, but In my case I'm trying to migrate from another Gateway API vendor and since I already have Gateway API installed in my cluster I'm not able to install Envoy Gateway in the same environment.

I agree with @aukevanleeuwen and I think we should just remove Gateway API from the helm chart all together.

I will create a PR tomorrow to remove this dependency (yes, it will be a breaking change) and let's see what the maintainers say.
Sadly, this won't solve the update issue, that this actually is about.

In the grafana-operator we publish our own CRDs as part of our releases in github as a separate file + in the helm chart. So if you want, you can update the EG CRDs separately based on the same version number as the helm chart, which is a simple solution.

@arkodg
Copy link
Contributor

arkodg commented Mar 25, 2025

we cannot delete the gateway-api CRDs from the Helm Chart, that negatively impacts the install experience for most cases, but we can account for the cases you've mentioned

  • we can start by introducing a new chart gateway-crds-helm that installs the Envoy Gateway CRDs by default
  • we can also additionally generate a k8s install release artifact - envoy-gateway-crds.yaml similar to the one in the PR
  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

wdyt @shahar-h @envoyproxy/gateway-maintainers @envoyproxy/gateway-reviewers ?

@zirain
Copy link
Member

zirain commented Mar 26, 2025

+1 on adding a new chart named gateway-crds-helm

@shahar-h
Copy link
Contributor Author

+1

@NissesSenap
Copy link
Contributor

I don't understand what issues would get solved by creating a new helm chart called gateway-crds-helm

And I don't understand what you mean with

  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

The funny thing about helm is that if you use subcharts you can't set skip-crds.
So the only way it would help is to move the CRDs in to the template folder any way. So then we can do it in the current chart just the same. Adding another helm chart will just create complexity.

I created a PR to move them from the CRDs folder in to template folder instead. This way we do it the same way as cert-manager.
If this is the way forward you like, I can make that PR also include the Gateway CRDs as a separate artifact so you also have the option to not install the CRDs trough helm at all.
That PR also needs fixes when it comes to docs and other stuff.

@shahar-h
Copy link
Contributor Author

@NissesSenap moving CRDs from crds folder to templates folder would be a breaking change for consumers that use EG chart as a sub chart and have EG/GW-API custom resources as part of the main chart, because they rely on EG crds to be installed before EG/GW-API custom resources.

@NissesSenap
Copy link
Contributor

This is true. But if you got a helm chart that only contains the CRDs in a subchart in the template folder, it will also be a breaking change. And to my knowledge you can't set the order of when to install subcharts.

So I don't think you can't guarantee that the CRDs will be installed before the chart anyway, but I might be wrong here.

Or do you want to have 3 helm charts? One for the Gateway API CRDs and one for EG CRDs?
Since adding the Both the Gateway API CRD and the EG CRD is a big part of the problem, at least for me.

To be honest, I don't care how this gets solved, I just want to solve it.
Please explain exactly how you want it to be done, and I can create a PR for it assuming it solves both the use-case of upgrading the CRDs and not forcing me to install the GW-API CRDs.

@guydc
Copy link
Contributor

guydc commented Mar 26, 2025

wdyt @shahar-h @envoyproxy/gateway-maintainers @envoyproxy/gateway-reviewers ?

+1

@arkodg arkodg removed the kind/decision A record of a decision made by the community. label Mar 26, 2025
@arkodg
Copy link
Contributor

arkodg commented Mar 26, 2025

Now that we have consensus, @NissesSenap assigning this to you since you mentioned you're interested in picking this up
Outlined the steps in #4001 (comment), the first one being creating a new 'gateway-crds-helm'' chart ( the current one gateway-halm chart is unaffected)

@NissesSenap
Copy link
Contributor

@arkodg what I don't understand in your comment is

  • once the Helm issue is resolved, we can also add Gateway API CRDs to gateway-crds-helm but disable it by default

After I create a separate helm chart for the eg CRDs then what?

What helm issue is it that you are waiting to get resolved? If you are talking about the upstream helm issues that cause all this pain around crd when it comes to helm, I don't think every will get solved.
Is it some other helm issue?

But I can create a PR that creates a new helm chart with EG CRDs in the template folder to make it possible to disable.
Or do you want them in the CRD folder?

@arkodg
Copy link
Contributor

arkodg commented Mar 26, 2025

yes, the new gateway-crds-helm chart https://github.com/envoyproxy/gateway/tree/main/charts will only contain the EG CRDs and will live under templates/

the helm issue i'm referring to is #4001 (comment)
as shahar mentioned, it may not be applicable if the data is compressed, so its worth trying to also add the Gateway API crds in the new gateway-crds-helm chart, but that can be done as a follow up

@NissesSenap
Copy link
Contributor

@arkodg I put all the fixes in the same PR, hopefully you all think it loos okay.
We could add some more documentation about using the new helm chart. But let's move the conversation over to the PR.

@arkodg arkodg modified the milestones: Backlog, v1.4.0-rc.1 Apr 17, 2025
@arkodg
Copy link
Contributor

arkodg commented Apr 20, 2025

keeping this open to track CI work to push charts to docker hub + docs work to highlight using these charts in the upgrade steps

@arkodg arkodg reopened this Apr 20, 2025
@arkodg
Copy link
Contributor

arkodg commented Apr 24, 2025

another decision point + extra TODOs here, is to also add the stable channel Gateway API CRDs in the chart, and decide which one should it default to

@arkodg arkodg modified the milestones: v1.4.0-rc.1, v1.4.0 Apr 30, 2025
@arkodg arkodg added the documentation Improvements or additions to documentation label May 3, 2025
@arkodg
Copy link
Contributor

arkodg commented May 3, 2025

keeping this open to track docs work in the installation / upgrade section

@arkodg
Copy link
Contributor

arkodg commented May 7, 2025

chart is available in https://hub.docker.com/r/envoyproxy/gateway-crds-helm, closing this issue for now, and creating a new one to track docs work

@arkodg arkodg closed this as completed May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment