-
Notifications
You must be signed in to change notification settings - Fork 552
cmd/catalog: Migrate to using cobra for CLI flag management #2362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/catalog: Migrate to using cobra for CLI flag management #2362
Conversation
/hold |
/rerun-all |
ca58cc6
to
c40c7a0
Compare
/hold cancel |
c40c7a0
to
af61918
Compare
af61918
to
1e9284d
Compare
/approve |
Do you think we should close #2524 - I think this PR should solve the issue I was running into as well (flag being redefined)... |
1e9284d
to
09a9cf6
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: perdasilva, timflannagan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
09a9cf6
to
43aea8b
Compare
New changes are detected. LGTM label has been removed. |
43aea8b
to
b4724b1
Compare
Update the cmd/catalog/main.go and use cobra's CLI library for managing CLI executables and parsing flags. Introduce cmd/catalog/start.go which is responsible for returning a populated command.Command structure that the main function can call and execute. Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
b4724b1
to
3bb1665
Compare
Hello, Can OLM be released ? The deployment is not updated but the image is, the catalog pod crashloopbackoff with: $ kubectl logs -n operator-lifecycle-manager catalog-operator-76b49b4678-hxbd2
Error: unknown shorthand flag: 'n' in -namespace
unknown shorthand flag: 'n' in -namespace |
@sathieu Yup - we'll need to apply a new rendered deployment manifest. Let me check in with the team about cutting a new minor version release. |
The single-hyphen flags do not work anymore due to operator-framework/operator-lifecycle-manager#2362
@sathieu It looks like the v0.21.0 release was just published (apologies for the delay). |
@timflannagan Thanks for the release. |
Signed-off-by: David Vossel <[email protected]> Kubevirt Platform: Make HosteControlPlane be aware to KubeVirt platform This code change "hcp.Spec.Platform.Type" to "Kubevirt" In this case, change Infrastructure to None platform, in order to prevent error in machine-config-server This Change is done in order to allow HostedControlPlane special behavior for KubeVirt platform For example: will add support for "cloud-provider-kubevirt" and "kubevirt-csi-driver" Default to using empty dir storage for registry with kubevirt platform Signed-off-by: David Vossel <[email protected]> Enable wildcard routes for OCP cluster when CNV tests are being run Signed-off-by: David Vossel <[email protected]> Enhance Kubevirt e2e test to ensure HostedCluster completes install Signed-off-by: David Vossel <[email protected]> Fix events message unit test flake Events by reason are added to a map and their resulting order is unpredictable. Sorting result and expected removes differences in order copy labels from HostedCluster to admin kubeconfig secret This copies labels from the HostedCluster instance to the resources created in the hostedcluster namespace to allow for label queries on resources created in that namespace. The idea is to be able to have a query using label selectors to get all resources in the centralized hostedcluster namespace associated with a an individual hostedcluster. This is the one piece that is created in that namespace that the user does not have control over specifying labels on. This adds labels from the HostedCluster instance that can be utilized for the selector. ensure token minter, socks proxy, and availablity prober are versioned with the control plane operator to prevent large scale restarts on hypershift operator upgrades for IBM Cloud This PR removes passing the socks proxy, avalability prober, and token-minter images from the hypershift operator to all the downstream namespaces for IBM Cloud deployments. Instead: the version of the control-plane-operator is utilized. This prevents large scale deployment rollouts on any hypershift operator update (that then propogates across all hostedclusters it manages) Resources: Improve runtime of test The test is currently extremely slow and takes about 20 minutes with `-race -count=25`. Improve that by pre-creating the secret that holds the bcrypt hash of the kubeadm password, which makes the reconciliation skip the calculation of said hash and brings the runtime with above parameters down to ~20 seconds. Add proxy support This change adds proxy support. It adds a small controler into the hypershift operator that makes it update its own deployement based on the proxy resource in the cluster. It further extends the hypershift operator to pass on its proxy env vars onto the capi provider and the CPO and the CPO to pass them on to the KAS, KCM and HCCO and ignition server. Lastly, the konnektivity-socks5-proxy is extended to not route a hardcoded list of cloudprovider domains into konnektivity but use its own proxy configuration for that (if any). Ref https://issues.redhat.com/browse/HOSTEDCP-331 Dump: Dump guest cluster nodes So the question "Did nodes join this cluster" can be answered easily, rather than having to infer it from "Is there a running pod somwhere". Run unitests with racedetector and -count=25 to detect flakes unify hypershift install make targets dev: specify a more useful `ko` base image Before this commit, images built with `ko` used a base image that included no debugging/userland tools. This commit updates the base image to use the "debug" variant of the base image which is nominally larger but includes a busybox userland, which is far more useful by default for a low cost. PKI: Use ECDSA keys by default These are computationally cheaper to generate and use and more secure and smaller. They are also FIPS-compliant[0]. Compatibility with exiting RSA CAs is retained. Due to lacking fips-compliance we can not use ed25519, as of time of writing they are only proposed for that: [1]. For comparison: Previously: ``` go test -race -count=5 ./control-plane-operator/controllers/hostedcontrolplane/pki/ ok github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane/pki 25.986s ``` With this change: ``` go test -race -count=5 ./control-plane-operator/controllers/hostedcontrolplane/pki/ ok github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane/pki 6.939s ``` [0]: https://csrc.nist.gov/publications/detail/fips/186/4/final [1]: https://csrc.nist.gov/publications/detail/fips/186/5/draft Ensure that all control plane pods use the cluster's pull secret Productized images will require the cluster's pull secret to be pulled from the registry. This commit updates the reconciliation of service accounts to include the pull secret if they don't already do so. Fix typo in how to pause reconciliation doc Remove CAPI-provider-agent ClusterRole from hypershift Agent platform Alow hypershift operator to grant RBAC permissions for clusterdeployments and agentClusterInstalls to the cpai-provider-agent Add ClusterID to HostedCluster API Adds ClusterID field to HostedClusterSpec. The field value is generated if left blank. Modifies CVO so it can be initialized with the HostedCluster's ClusterID before generating one. The ClusterID is needed to add a replace label to ServiceMonitors and PodMonitors so that metrics belonging to a particular cluster can be identified. Uses the clusterID in the HostedControlPlane to add a replace label to every ServiceMonitor and PodMonitor created in a control plane namespace. Hypershift-operator: Increase worker count This increase the workers for the hostedcluster operator to 10 (from 1) and for the nodepool controller to 30 (also from 1). Otherwise the workers become a bottleneck if more than one hostedcluster is changed at a time. Both the 10 workers and the 30 workers are relatively arbitrary chosen, these numbers might need revisiting in the future or even warrant a config option. Make clusterID optional in HostedControlPlane clusterID Previous versions of control plane operators have no knowledge of new fields added to the HostedControlPlane resource. In order to maintain backwards compatibility we cannot add any required fields to the HostedControlPlane type. This change makes the `clusterID` field optional to maintain compatibility with previous control-plane-operators. set initialDelaySeconds to 60s on olm operators add required-api to availability check in olm components e2e: adjust budgets Introduce `OIDCConfigurationInvalid` condition for OIDC setup validation This commit introduces a new `OIDCConfigurationInvalid` condition for HostedCluster. If OIDC documents cannot be reconciled for a cluster, `OIDCConfigurationInvalid=True`, otherwise `OIDCConfigurationInvalid` is removed. The message associated with the condition tries to provide some more helpful context about what might be wrong to help the user correct the issue. More error interpretation can be added in the future. Use patch instead of update when updating HCP status and finalizers This makes it possible for the control plane operator to work with future versions of the HostedControlPlane CRD that contain required fields that it's not aware of. Using MergeFromWithOptimisticLock option enables similar semantics to the update operation, ensuring that we update a known resourceVersion of the resource. Delete Cluster API Cluster Role binding (openshift#1143) * Delete Cluster API Cluster Role binding * Cluster API Cluster Role binding * Updates * fix: address PR feedback Co-authored-by: Hidematsu Sueki <[email protected]> fix AWS HostedCluster fixture to set hostname in alignment with endpointAccess KAS: Never set proxy This seems to make the apiserver unable to reach anything over konnektivity. Not completely sure why this happens, probably because it ends up having the proxy ip address for everything (that is speculation). Just skip setting any proxy env vars, the few requests the KAS does can go through the guest cluster. Converge helper binaries and ignitions-server into CPO binary This makes them part of the payload and thus avoids rotating most of the controlplane when the hypershift operator gets updated. Fix `ko` entries and update ignition-server dev docs This commit removes outdated binary entries from `ko.yaml` for those which have been collapsed into `control-plane-operator` and updates the in-cluster ignition server developer documentation to reflect those changes. Trigger reconcile when paused time is up add required-api to availablity prober for OLM and HCCO Revert "PKI: Use ECDSA keys by default" This reverts commit 464b572. HO: Don't report NotFund for hostedcluster as error This is completely expected during deletion, logging it at error level makes it look like an issue, which it is not. KAS: Set proxy, but exempt pod and service CIDR The KAS needs the proxy settings to communicate with the cloud provider. However, the egress transport it uses wraps another transport that respects proxy settings which is why we need to excempt pod and service CIDR of the guest cluster to not break Konnektivity. I also tried to stop using the egress config and use the konnektivity-socks5-proxy, but that breaks SPDY connections (exec, port-forward). Ref https://issues.redhat.com/browse/HOSTEDCP-333 add external-dns flags to CI install make target increase MaxConcurrentReconciles on AWS PrivateLink controllers enable external-dns registry Registry configuration: reconcile only what we need to changes In its current state, the hosted cluster config operator overwrites any changes made by the guest cluster admin to the registry configuration. This prevents changes like enabling a route or increasing the number of replicas. This commit limits what we change to things we need to change and leave everything else as is. Add default vxlan port for kubevirt clusters Signed-off-by: David Vossel <[email protected]> Update staticcheck to a version that works with go 1.18 The version we currently use can not compile anything and fails with errors like this: could not load export data: cannot import "math/bits" (unknown iexport format version 2), export data is newer version - update tool (compile) Note that this doesn't mean staticcheck supports generics, it just means it can be compiled with go 1.18. Dump: Always create an archive Currently, dump just drops a lot of files. This is useful for browsing them in the CI job output, but terrible for downloading them for local inspection, as downloading a lot of files is extremely slow, even if the files aren't big. This change makes us always create an archive of the dump to not require extending every CI job to do this manually. docs: Upgrade mkdocs/material to fix Netlify breakages This upgrades mkdocs/material to fix Netlify docs compilation breakages resulting from mkdocs/mkdocs#2799. docs for DNS indirection Add additionalTrustBundle to HostedCluster API Copy HostedCluster additionalTrustBundle to HostedControlPlane HostedCluster can optionally reference a configmap, in which case we copy the configmap to the HostedControlPlane namespace (similar to SSHKey and other fields). Create HostedCluster user-ca-bundle configmap When AdditionalTrustBundle is defined we create this ConfigMap to align with the behavior of regular OCP clusters and enable consumption of user-defined CA certs by the guest cluster. Add AdditionalTrustBundle to MCS bootstrap params When AdditionalTrustBundle is specified, we serialize the configmap and pass to the MCO bootstrap command via the default user-ca-bundle-config.yaml location - this means the MCO bootstrap will read the file when included, (the code already ignores the case where the file doesn't exist, since openshift/installer only conditionally creates the manifest) Add create cli option for additional-trust-bundle This can be used to reference a ConfigMap that contains a user CA bundle. Add trust bundle volumes to hostedcluster_controller The CPO and ignition server need the user CA so the registryclient can access a local registry with a self-signed cert Add trust bundle to hosted-cluster-config-operator Add install additional-trust-bundle CLI Adds a CLI option and corresponding volume to the operator pod, this is needed so the operator can look up release image metadata when the release image specified is locally mirrored. Note the mount path/filename were chosen to align with the expected defaults ref https://go.dev/src/crypto/x509/root_linux.go (and also current OCP docs for cert injection using operators) read apiserver-network-proxy image from ocp payload Fix CPO to work with 4.11 The single-hyphen flags do not work anymore due to operator-framework/operator-lifecycle-manager#2362 Retry EIP tagging failures during infra creation Before this commit, EIP tagging failures resulting from the EIP not being found after the EIP was successfully created led to infra creation failing overall because the tagging operation was not retried. This commit adds retry logic to EIP tagging to account for the case when EIP creation succeeds but tagging fails because the AWS tagging API doesn't yet see the new EIP. default AntiAffinity rules to spread KubeVirt VMs across mgmt nodes Signed-off-by: David Vossel <[email protected]> Update to referencing 4.10 disks and documentation for KV guide Signed-off-by: David Vossel <[email protected]> Document KubeVirt Platform Ingress/DNS options Signed-off-by: David Vossel <[email protected]> Get autoscaler/machine-approver images from the payload These components watch both management cluster (Machine scalable resources) and guest cluster. Originally we were pinning the images to a version that would cross any HostedCluster. This PR let us pick them from each particular payload resulting in some benefits: Each hostedCluster runs the component version that was tested with that particular kube/ocp version No additional work needed to productise the images as they com from the payload. Since CAPI CRDs should be backward compatible, having different controller versions shouldn't cause an issue. Once the CAPI image is in the payload we can do the same for it. Hypershift operator: Give a priority that is higher than any controlplane component e2e: Don't fail test on transient recoverable API lookup Before this commit, calls to `WaitForConditionsOnHostedControlPlane()` could fail a test if an API lookup fails even though that lookup is recoverable and retried automatically. This made the test flaky. This commit fixes the code so that these retriable errors are logged but do not fail the test. This commit also moves a log message which was intended to emit during retries but was instead placed at the exit point. Fix priority class for olm cronjob and verify priorityclasses in e2e The olm cronjob had a prioryClass of openshift-user-critical which has a priority that is above all other controlplane components in the management cluster. Downgrade it to the standard hypershift-control-plane and add an e2e test that verifies that no pod has a priority higher than the etcd priority. e2e: Don't enable user workload monitoring on management clusters Before this commit, UWM was enabled by the e2e `setup` command, which was used in the past but is no longer used. The UWM stack is thus wasting resources on management clusters used for e2e runs. This commit removes UWM from the monitoring setup for e2e tests. Remove `ValidAMI` AWS NodePool condition when AMI is user-defined Before this commit, the `ValidAMI=True` condition was applied to AWS NodePools which specified user-defined AMIs, which was misleading because those AMIs are not actually validated by HyperShift. This commit removes the `ValidAMI` condition entirely from AWS NodePools which declare user-specified AMIs, since they can't be validated. Add node troubleshooting documentation Documents some tools that are available in our code base that can help understanding why nodes are not able to join a cluster. Also adds some fixes to the getting started documentation. Introduce logic to accommodate in place upgrades This is still gated by validateManagement which prevents this logic from ever running. Upcoming PRs will drop the gating validation and will complete the in place upgrade logic to run pods in the guest cluster. do not wait on capi clusterrolebinding delete Add support to set up a http proxy for guest clusters This change: * Adds a new --enable-proxy flag to the create aws infra command which will create a http proxy instance if set * It will also cause a proxy config pointing to said instance to be added to the HostedCluster * Extends the globalconfig package to optionally be able to set the status of a proxy config, including additional no_proxy hosts * Makes the nodepool controllers userdata secret generation consider a proxy config if preset * Makes the ignition config rendering use a proxy config with status set, as otherwise the proxy config will be ignored * Makes the aws infra destroy command also clean up instances to not get blocked by the proxy instance Ref https://issues.redhat.com/browse/HOSTEDCP-333 Create valid route names with long namespace names When using a long name for a cluster namespace, the routes that the control plane operator creates is invalid because a segment of the domain name has a length > 63 chars. This commit sets the host name for routes to ensure that the resulting host name is valid. Modification of the route hostname only happens when the resulting hostname will be invalid. The name is shortened by trimming the original name and appending a hash of the entire name, resulting in repeatable and distinct names every time the name is shortened. Enforce hostedcluster service route immutability constraints Before this PR, the `route.spec.host` field associated with a given `hostedcluster.spec.services[].route` value was being defaulted by OCP and then later overwritten with an empty string on subsequent reconcile loops. This is fundamentally a side effect of a violation of the immutability constraint on the `hostname` field. To fix the issue, this PR refactors the relevant Route reconciliation code to compute and set the `route.spec.host` field only during the creation of the Route resource, which enforces the immutability constraint and resolves the side effect. Signal parsing config failure in a condition This PR signal invalid cluster config in a HC condition. It also do the same for NodePools and refactors to narrow down the reconcileUserdata signature and let it only use what it takes. Disable PodSecurity admission in 4.11 as it breaks conformance This is enabled by default in 4.11/kube 1.23 which breaks a bunch of networking tests that want to create privileged pods which the default `Restricted` policty doesn't allow. Do the same as the kas-operator and disable it. ensure imagePullPolicies are IfNotPresent for better tolerance of networking outages AWS infra destroy, handle empty instanceIDs In the case where the nodepool didn't create any instances this list can be empty on destroy, which causes an error like: missing required field, TerminateInstancesInput.InstanceIds This causes the destroy to get stuck, so instead handle the case where the list is empty. Make HyperShift operator compatible with previous CPOs without utilities Adds symlinks in HyperShift operator image for previously standalone utilities and enables the control plane operator to invoke the right command based on the symlink it was invoked with. Checks whether the CPO for a given cluster includes subcommands for utilities by checking the labels on the CPO image, and only uses the CPO image for utilities if it does include the labels. continue to allow support for specification of clusterAutoscalerImage and machineApprover image until no longer necessary IBM's production release candidate still relies on this behavior to deploy the images specified in the BOM. We will need to have this support in place until we can confirm we have successfully upgraded all production clusters to a later BOM version that no longer needs this behavior and instead uses the release image Fix kubernetes.default for public clusters with proxy On public clusters that use a http proxy, the kubernetes.default service currently doesn't work, because it is served by a HAProxy that tries to directly connect to the KAS, which isn't possible. This change: * Adds a new `kubernetes-default-proxy` subcommand to the CPO which is a simple tcp proxy that initiates a connection over http connect, as it doesn't seem that HAProxy supports this * Uses a static pod with said subcommend on clusters with a https proxy that are not private ensure release image annotation added to deployment template metadata to enable proper detection in IBM Cloud deployment process The release image annotation should also be added to the pod template annotation so it is reflected on the pods that are scheduled from the deployment. This is necessary from a monitoring perspective and for detection in IBM Cloud's deployment process to ensure rollouts complete on a release image update add validating webhook for HostedClusters reconcile oauth serving cert rbac to allow oauth proxies in the openshift-monitoring namespace to work This adds rbac that allows authenticated users to fetch the oauth-serving-cert configmap. This is necessary for the oauth proxies in the openshift-monitoring namespace to work and allow users access to the prometheus/grafana/alertmanager dashboards. remove version check for api system that is not in 4.9 disable reconcile of registry config in IBMCloud deployments Currently there is a race condition in IBMCloud deployments where if hypershift initializes the registry config before the managed service operation has a chance to properly initialize the registry config: It will result in a bad initial config that will cause the cluster-image-registry operator to crash and lead to tickets. For IBMCloud deployments: the initialization of this config should be delegated to the managed service to do the initalization so this does not occur. From there: the user can edit however they like in line with upstream openshift Use forked processes instead of pods to generate ignition payload Before this commit, ignition payloads for nodepools were generated by orchestrating MCO components using pods. This was a performance problem because ignition server restarts (e.g. during upgrades) resulted in thundering herds of ignition payload generation pods, limiting overall cluster capacity. This commit refactors ignition payload generation away from pods and to a mechanism that extracts MCO binaries (and supporting files) from release payloads so they can be forked as subprocesses of the ignition server controller, removing all pod scheduling overhead. This change should be completely transparent to the end user and is a drop-in replacement for the old implementation. The current implementation has the following limitations: * Extracted payload contents are not cached across executions per release image. This is possible, but not worth doing unless subsequent performance analysis justifies it. * All payload generation executions are serial. Concurrent executions are possible, but not worth doing unless subsequent performance analysis justifies it. This commit also enables metrics collection for the ignition server component and adds the following metrics: * ign_server_token_rotation_total * ign_server_payload_cache_miss_total * ign_server_payload_generation_seconds * ign_server_payload_cache_total Ignition server: Actually use workdir The defaulting was incorrect configure cipher suites to prevent using medium strength ssl ciphers We should only be using strong ciphers in hypershift components. By default Go includes medieum strength ciphers in the default tls config. This adjusts to only be strong ciphers (TLS 1.2 and TLS 1.3) move to ga apis for all components now that management clusters at minimum release boundary fix(cpo): Scope down secrets access for olm collect profiles cj Update config.go Added shutdown-send-retry-after and updated shutdown-delay-duration feat(cpo): adhere to upgrade order from kube version skew policy fix(cpo): set tls cipher suites flags on kcm and scheduler Ensure cache is set during token rotation before reconciling Add missing control plane prometheus rules Ensure that everything uses imagePullPolicy IfNotPResent for resiliency Ref https://issues.redhat.com/browse/HOSTEDCP-330 Updated secret permissions to conform to kubernetes CIS benchmark Hypershift Image Pull Policy to IfNotPresent feat(cpo): Support disable profiling annotation Set Recommended Leader Election Values Update control-plane-operator/controllers/hostedcontrolplane/scheduler/config_test.go Co-authored-by: Hidematsu Sueki <[email protected]> mend Add fallback set cache value from old token When you run multiple pods, by design the first one doing a token rotation in the secret CR forces the other two pods caches to generate a new payload for that token because they are not aware of it yet. This attempts to mitigate that scenario. feat(oauth): allow challenge override for OpenID cache registry files adopt existing immutable selectors to prevent errors reconciling components from roks toolkit clusters This fixes an error in the control plane operator where it will fail to reconcile and properly adopt deployments based on roks toolkit clusters. This is due to the fact that the selectors has changed and that is an immutable field. Specifically, hypershift selectors add a duplicate selector hypershit.component that is not contained in roks toolkit clusters. Even with the single app selector: deployment rollout behavior is unchanged. This allows for zero downtime adoption of roks toolkit clusters. Associated issue: openshift#1042 Use non-strict mode when parsing global config The HyperShift operator uses the latest version of the OpenShift API to serialize global config in the HCP (for compatibility with older CPOs). The issue is that if there's a new field in the latest API that the older CPO does not understand, we currently produce an error decoding the YAML because the YAML serializer we use is using strict mode. This commit switches to a YAML serializer that does have strict mode for parsing of the global configuration. Making it possible for older CPOs to parse YAML from the latest HyperShift operator. change secret 420 to416 updt secret to 416 chg scrts perm to 416 chg scrts perm to 416 chg scrts perm to 416 chg scrts perm to 416 add pointer for scrt416 P add pointer for scrt416 P add pointer for scrt416 defaultmode 416 mcs-tls 416 mode on scrt 416 scr mode change scrt DefaultMode to 416 DefaultMode to 416 Scrt DefaultMode416 Scrt DefaultMode416 change secret Defaultmode 420 to 416
Update the cmd/catalog/main.go and use cobra's CLI library for managing
CLI executables and parsing flags.
Introduce cmd/catalog/start.go which is responsible for returning a
populated command.Command structure that the main function can call and
execute.
Signed-off-by: timflannagan [email protected]
Description of the change:
Motivation for the change:
Reviewer Checklist
/doc