Skip to content

cmd/catalog: Migrate to using cobra for CLI flag management #2362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

timflannagan
Copy link
Member

Update the cmd/catalog/main.go and use cobra's CLI library for managing
CLI executables and parsing flags.

Introduce cmd/catalog/start.go which is responsible for returning a
populated command.Command structure that the main function can call and
execute.

Signed-off-by: timflannagan [email protected]

Description of the change:

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

Sorry, something went wrong.

@timflannagan
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 17, 2021
@timflannagan
Copy link
Member Author

/rerun-all

@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from ca58cc6 to c40c7a0 Compare September 23, 2021 14:31
@timflannagan
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2021
@timflannagan timflannagan reopened this Sep 23, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2021
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from c40c7a0 to af61918 Compare January 4, 2022 19:03
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 4, 2022
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from af61918 to 1e9284d Compare January 4, 2022 19:04
@perdasilva
Copy link
Collaborator

/approve

@perdasilva
Copy link
Collaborator

Do you think we should close #2524 - I think this PR should solve the issue I was running into as well (flag being redefined)...

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2022
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from 1e9284d to 09a9cf6 Compare February 17, 2022 00:13
@openshift-ci
Copy link

openshift-ci bot commented Feb 17, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva, timflannagan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [perdasilva,timflannagan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 17, 2022
@perdasilva
Copy link
Collaborator

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 17, 2022
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from 09a9cf6 to 43aea8b Compare February 17, 2022 16:16
@openshift-ci
Copy link

openshift-ci bot commented Feb 17, 2022

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 17, 2022
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from 43aea8b to b4724b1 Compare February 17, 2022 16:18
Update the cmd/catalog/main.go and use cobra's CLI library for managing
CLI executables and parsing flags.

Introduce cmd/catalog/start.go which is responsible for returning a
populated command.Command structure that the main function can call and
execute.

Signed-off-by: timflannagan <[email protected]>
@timflannagan timflannagan force-pushed the update-catalog-operator-cobra-cli branch from b4724b1 to 3bb1665 Compare February 17, 2022 17:32
@timflannagan timflannagan added the lgtm Indicates that a PR is ready to be merged. label Feb 17, 2022
@timflannagan timflannagan merged commit 8119718 into operator-framework:master Feb 17, 2022
@timflannagan timflannagan deleted the update-catalog-operator-cobra-cli branch February 17, 2022 18:16
@perdasilva perdasilva mentioned this pull request Feb 22, 2022
5 tasks
@sathieu
Copy link
Contributor

sathieu commented Mar 8, 2022

Hello,

Can OLM be released ?

The deployment is not updated but the image is, the catalog pod crashloopbackoff with:

$ kubectl logs -n operator-lifecycle-manager catalog-operator-76b49b4678-hxbd2
Error: unknown shorthand flag: 'n' in -namespace
unknown shorthand flag: 'n' in -namespace

@timflannagan
Copy link
Member Author

@sathieu Yup - we'll need to apply a new rendered deployment manifest. Let me check in with the team about cutting a new minor version release.

alvaroaleman added a commit to alvaroaleman/hypershift that referenced this pull request Mar 29, 2022

Verified

This commit was signed with the committer’s verified signature.
alvaroaleman Alvaro Aleman
The single-hyphen flags do not work anymore due to
operator-framework/operator-lifecycle-manager#2362
@timflannagan
Copy link
Member Author

@sathieu It looks like the v0.21.0 release was just published (apologies for the delay).

@sathieu
Copy link
Contributor

sathieu commented Apr 30, 2022

@timflannagan Thanks for the release.

kimetro pushed a commit to kimetro/hypershift that referenced this pull request Sep 23, 2022
Signed-off-by: David Vossel <[email protected]>

Kubevirt Platform: Make HosteControlPlane be aware to KubeVirt platform

This code change "hcp.Spec.Platform.Type" to "Kubevirt"
In this case, change Infrastructure to None platform, in order to prevent error in machine-config-server
This Change is done in order to allow HostedControlPlane special behavior for KubeVirt platform
For example: will add support for "cloud-provider-kubevirt" and "kubevirt-csi-driver"

Default to using empty dir storage for registry with kubevirt platform

Signed-off-by: David Vossel <[email protected]>

Enable wildcard routes for OCP cluster when CNV tests are being run

Signed-off-by: David Vossel <[email protected]>

Enhance Kubevirt e2e test to ensure HostedCluster completes install

Signed-off-by: David Vossel <[email protected]>

Fix events message unit test flake

Events by reason are added to a map and their resulting order is unpredictable. Sorting result and expected removes differences in order

copy labels from HostedCluster to admin kubeconfig secret

This copies labels from the HostedCluster instance to the resources created in the hostedcluster namespace to allow for label queries on resources created in that namespace. The idea is to be able to have a query using label selectors to get all resources in the centralized hostedcluster namespace associated with a an individual hostedcluster. This is the one piece that is created in that namespace that the user does not have control over specifying labels on. This adds labels from the HostedCluster instance that can be utilized for the selector.

    ensure token minter, socks proxy, and availablity prober are versioned with the control plane operator to prevent large scale restarts on hypershift operator upgrades for IBM Cloud

    This PR removes passing the socks proxy, avalability prober, and token-minter images from the hypershift operator to all the downstream namespaces for IBM Cloud deployments. Instead: the version of the control-plane-operator is utilized. This prevents large scale deployment rollouts on any hypershift operator update (that then propogates across all hostedclusters it manages)

Resources: Improve runtime of test

The test is currently extremely slow and takes about 20 minutes with
`-race -count=25`. Improve that by pre-creating the secret that holds
the bcrypt hash of the kubeadm password, which makes the reconciliation
skip the calculation of said hash and brings the runtime with above
parameters down to ~20 seconds.

Add proxy support

This change adds proxy support. It adds a small controler into the
hypershift operator that makes it update its own deployement
based on the proxy resource in the cluster. It further extends the
hypershift operator to pass on its proxy env vars onto the capi provider
and the CPO and the CPO to pass them on to the KAS, KCM and HCCO and
ignition server.

Lastly, the konnektivity-socks5-proxy is extended to not route a
hardcoded list of cloudprovider domains into konnektivity but use its
own proxy configuration for that (if any).

Ref https://issues.redhat.com/browse/HOSTEDCP-331

Dump: Dump guest cluster nodes

So the question "Did nodes join this cluster" can be answered easily,
rather than having to infer it from "Is there a running pod somwhere".

Run unitests with racedetector and -count=25 to detect flakes

unify hypershift install make targets

dev: specify a more useful `ko` base image

Before this commit, images built with `ko` used a base image that included
no debugging/userland tools. This commit updates the base image to use the
"debug" variant of the base image which is nominally larger but includes
a busybox userland, which is far more useful by default for a low cost.

PKI: Use ECDSA keys by default

These are computationally cheaper to generate and use and more secure
and smaller. They are also FIPS-compliant[0].
Compatibility with exiting RSA CAs is retained.

Due to lacking fips-compliance we can not use ed25519, as of time of
writing they are only proposed for that: [1].

For comparison:

Previously:

```
go test -race -count=5 ./control-plane-operator/controllers/hostedcontrolplane/pki/
ok github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane/pki 25.986s
```

With this change:

```
go test -race -count=5 ./control-plane-operator/controllers/hostedcontrolplane/pki/
ok  	github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane/pki	6.939s
```

[0]: https://csrc.nist.gov/publications/detail/fips/186/4/final
[1]: https://csrc.nist.gov/publications/detail/fips/186/5/draft

Ensure that all control plane pods use the cluster's pull secret

Productized images will require the cluster's pull secret to be pulled
from the registry.

This commit updates the reconciliation of service accounts to include
the pull secret if they don't already do so.

Fix typo in how to pause reconciliation doc

Remove CAPI-provider-agent ClusterRole from hypershift Agent platform

Alow hypershift operator to grant RBAC permissions for clusterdeployments and agentClusterInstalls to the cpai-provider-agent

Add ClusterID to HostedCluster API

Adds ClusterID field to HostedClusterSpec. The field value is generated
if left blank.
Modifies CVO so it can be initialized with the HostedCluster's ClusterID
before generating one.
The ClusterID is needed to add a replace label to ServiceMonitors and
PodMonitors so that metrics belonging to a particular cluster can be
identified.

Uses the clusterID in the HostedControlPlane to add a replace label to
every ServiceMonitor and PodMonitor created in a control plane namespace.

Hypershift-operator: Increase worker count

This increase the workers for the hostedcluster operator to 10 (from 1)
and for the nodepool controller to 30 (also from 1). Otherwise the
workers become a bottleneck if more than one hostedcluster is changed at
a time.

Both the 10 workers and the 30 workers are relatively arbitrary chosen,
these numbers might need revisiting in the future or even warrant a
config option.

Make clusterID optional in HostedControlPlane clusterID

Previous versions of control plane operators have no knowledge of new
fields added to the HostedControlPlane resource. In order to maintain
backwards compatibility we cannot add any required fields to the
HostedControlPlane type.

This change makes the `clusterID` field optional to maintain
compatibility with previous control-plane-operators.

set initialDelaySeconds to 60s on olm operators

add required-api to availability check in olm components

e2e: adjust budgets

Introduce `OIDCConfigurationInvalid` condition for OIDC setup validation

This commit introduces a new `OIDCConfigurationInvalid` condition for
HostedCluster. If OIDC documents cannot be reconciled for a cluster,
`OIDCConfigurationInvalid=True`, otherwise `OIDCConfigurationInvalid` is
removed. The message associated with the condition tries to provide some more
helpful context about what might be wrong to help the user correct the issue.
More error interpretation can be added in the future.

Use patch instead of update when updating HCP status and finalizers

This makes it possible for the control plane operator to work with
future versions of the HostedControlPlane CRD that contain required
fields that it's not aware of.

Using MergeFromWithOptimisticLock option enables similar semantics to
the update operation, ensuring that we update a known resourceVersion
of the resource.

Delete Cluster API Cluster Role binding (openshift#1143)

* Delete Cluster API Cluster Role binding

* Cluster API Cluster Role binding

* Updates

* fix: address PR feedback

Co-authored-by: Hidematsu Sueki <[email protected]>

fix AWS HostedCluster fixture to set hostname in alignment with endpointAccess

KAS: Never set proxy

This seems to make the apiserver unable to reach anything over
konnektivity. Not completely sure why this happens, probably because it
ends up having the proxy ip address for everything (that is
speculation).

Just skip setting any proxy env vars, the few requests the KAS does can
go through the guest cluster.

Converge helper binaries and ignitions-server into CPO binary

This makes them part of the payload and thus avoids rotating most of the
controlplane when the hypershift operator gets updated.

Fix `ko` entries and update ignition-server dev docs

This commit removes outdated binary entries from `ko.yaml` for those
which have been collapsed into `control-plane-operator` and updates
the in-cluster ignition server developer documentation to reflect
those changes.

Trigger reconcile when paused time is up

add required-api to availablity prober for OLM and HCCO

Revert "PKI: Use ECDSA keys by default"

This reverts commit 464b572.

HO: Don't report NotFund for hostedcluster as error

This is completely expected during deletion, logging it at error level
makes it look like an issue, which it is not.

KAS: Set proxy, but exempt pod and service CIDR

The KAS needs the proxy settings to communicate with the cloud provider.
However, the egress transport it uses wraps another transport that
respects proxy settings which is why we need to excempt pod and service
CIDR of the guest cluster to not break Konnektivity.

I also tried to stop using the egress config and use the
konnektivity-socks5-proxy, but that breaks SPDY connections (exec,
port-forward).

Ref https://issues.redhat.com/browse/HOSTEDCP-333

add external-dns flags to CI install make target

increase MaxConcurrentReconciles on AWS PrivateLink controllers

enable external-dns registry

Registry configuration: reconcile only what we need to changes

In its current state, the hosted cluster config operator overwrites any
changes made by the guest cluster admin to the registry configuration.
This prevents changes like enabling a route or increasing the number of
replicas.

This commit limits what we change to things we need to change and leave
everything else as is.

Add default vxlan port for kubevirt clusters

Signed-off-by: David Vossel <[email protected]>

Update staticcheck to a version that works with go 1.18

The version we currently use can not compile anything and fails with
errors like this:

could not load export data: cannot import "math/bits" (unknown iexport format version 2), export data is newer version - update tool (compile)

Note that this doesn't mean staticcheck supports generics, it just means
it can be compiled with go 1.18.

Dump: Always create an archive

Currently, dump just drops a lot of files. This is useful for browsing
them in the CI job output, but terrible for downloading them for local
inspection, as downloading a lot of files is extremely slow, even if the
files aren't big.

This change makes us always create an archive of the dump to not require
extending every CI job to do this manually.

docs: Upgrade mkdocs/material to fix Netlify breakages

This upgrades mkdocs/material to fix Netlify docs compilation breakages
resulting from mkdocs/mkdocs#2799.

docs for DNS indirection

Add additionalTrustBundle to HostedCluster API

Copy HostedCluster additionalTrustBundle to HostedControlPlane

HostedCluster can optionally reference a configmap, in which case we copy
the configmap to the HostedControlPlane namespace (similar to SSHKey and
other fields).

Create HostedCluster user-ca-bundle configmap

When AdditionalTrustBundle is defined we create this ConfigMap to
align with the behavior of regular OCP clusters and enable
consumption of user-defined CA certs by the guest cluster.

Add AdditionalTrustBundle to MCS bootstrap params

When AdditionalTrustBundle is specified, we serialize the configmap and
pass to the MCO bootstrap command via the default user-ca-bundle-config.yaml
location - this means the MCO bootstrap will read the file when included,
(the code already ignores the case where the file doesn't exist, since
openshift/installer only conditionally creates the manifest)

Add create cli option for additional-trust-bundle

This can be used to reference a ConfigMap that contains a user CA
bundle.

Add trust bundle volumes to hostedcluster_controller

The CPO and ignition server need the user CA so the registryclient
can access a local registry with a self-signed cert

Add trust bundle to hosted-cluster-config-operator

Add install additional-trust-bundle CLI

Adds a CLI option and corresponding volume to the operator pod,
this is needed so the operator can look up release image metadata
when the release image specified is locally mirrored.

Note the mount path/filename were chosen to align with the expected
defaults ref https://go.dev/src/crypto/x509/root_linux.go (and also
current OCP docs for cert injection using operators)

read apiserver-network-proxy image from ocp payload

Fix CPO to work with 4.11

The single-hyphen flags do not work anymore due to
operator-framework/operator-lifecycle-manager#2362

Retry EIP tagging failures during infra creation

Before this commit, EIP tagging failures resulting from the EIP not
being found after the EIP was successfully created led to infra creation
failing overall because the tagging operation was not retried.

This commit adds retry logic to EIP tagging to account for the case when
EIP creation succeeds but tagging fails because the AWS tagging API doesn't
yet see the new EIP.

default AntiAffinity rules to spread KubeVirt VMs across mgmt nodes

Signed-off-by: David Vossel <[email protected]>

Update to referencing 4.10 disks and documentation for KV guide

Signed-off-by: David Vossel <[email protected]>

Document KubeVirt Platform Ingress/DNS options

Signed-off-by: David Vossel <[email protected]>

Get autoscaler/machine-approver images from the payload

These components watch both management cluster (Machine scalable resources) and guest cluster.
Originally we were pinning the images to a version that would cross any HostedCluster.
This PR let us pick them from each particular payload resulting in some benefits:

Each hostedCluster runs the component version that was tested with that particular kube/ocp version
No additional work needed to productise the images as they com from the payload.
Since CAPI CRDs should be backward compatible, having different controller versions shouldn't cause an issue.
Once the CAPI image is in the payload we can do the same for it.

Hypershift operator: Give a priority that is higher than any controlplane component

e2e: Don't fail test on transient recoverable API lookup

Before this commit, calls to `WaitForConditionsOnHostedControlPlane()` could
fail a test if an API lookup fails even though that lookup is recoverable and
retried automatically. This made the test flaky.

This commit fixes the code so that these retriable errors are logged but do
not fail the test.

This commit also moves a log message which was intended to emit during retries
but was instead placed at the exit point.

Fix priority class for olm cronjob and verify priorityclasses in e2e

The olm cronjob had a prioryClass of openshift-user-critical which has a
priority that is above all other controlplane components in the
management cluster. Downgrade it to the standard
hypershift-control-plane and add an e2e test that verifies that no pod
has a priority higher than the etcd priority.

e2e: Don't enable user workload monitoring on management clusters

Before this commit, UWM was enabled by the e2e `setup` command, which
was used in the past but is no longer used. The UWM stack is thus wasting
resources on management clusters used for e2e runs.

This commit removes UWM from the monitoring setup for e2e tests.

Remove `ValidAMI` AWS NodePool condition when AMI is user-defined

Before this commit, the `ValidAMI=True` condition was applied to AWS NodePools
which specified user-defined AMIs, which was misleading because those AMIs are
not actually validated by HyperShift.

This commit removes the `ValidAMI` condition entirely from AWS NodePools which
declare user-specified AMIs, since they can't be validated.

Add node troubleshooting documentation

Documents some tools that are available in our code base that can help
understanding why nodes are not able to join a cluster.

Also adds some fixes to the getting started documentation.

Introduce logic to accommodate in place upgrades

This is still gated by validateManagement which prevents this logic from ever running.
Upcoming PRs will drop the gating validation and will complete the in place upgrade logic to run pods in the guest cluster.

do not wait on capi clusterrolebinding delete

Add support to set up a http proxy for guest clusters

This change:
* Adds a new --enable-proxy flag to the create aws infra command
  which will create a http proxy instance if set
* It will also cause a proxy config pointing to said instance to be
  added to the HostedCluster
* Extends the globalconfig package to optionally be able to set the
  status of a proxy config, including additional no_proxy hosts
* Makes the nodepool controllers userdata secret generation consider a
  proxy config if preset
* Makes the ignition config rendering use a proxy config with status
  set, as otherwise the proxy config will be ignored
* Makes the aws infra destroy command also  clean up instances to not
  get blocked by the proxy instance

Ref https://issues.redhat.com/browse/HOSTEDCP-333

Create valid route names with long namespace names

When using a long name for a cluster namespace, the routes that the
control plane operator creates is invalid because a segment of the
domain name has a length > 63 chars.

This commit sets the host name for routes to ensure that the resulting
host name is valid. Modification of the route hostname only happens when
the resulting hostname will be invalid. The name is shortened by
trimming the original name and appending a hash of the entire name,
resulting in repeatable and distinct names every time the name is
shortened.

Enforce hostedcluster service route immutability constraints

Before this PR, the `route.spec.host` field associated with a given
`hostedcluster.spec.services[].route` value was being defaulted by OCP and then
later overwritten with an empty string on subsequent reconcile loops. This is
fundamentally a side effect of a violation of the immutability constraint on the
`hostname` field.

To fix the issue, this PR refactors the relevant Route reconciliation code to
compute and set the `route.spec.host` field only during the creation of the
Route resource, which enforces the immutability constraint and resolves the side
effect.

Signal parsing config failure in a condition

This PR signal invalid cluster config in a HC condition. It also do the same for NodePools and refactors to narrow down the reconcileUserdata signature and let it only use what it takes.

Disable PodSecurity admission in 4.11 as it breaks conformance

This is enabled by default in 4.11/kube 1.23 which breaks a bunch of
networking tests that want to create privileged pods which the default
`Restricted` policty doesn't allow.

Do the same as the kas-operator and disable it.

ensure imagePullPolicies are IfNotPresent for better tolerance of networking outages

AWS infra destroy, handle empty instanceIDs

In the case where the nodepool didn't create any instances this
list can be empty on destroy, which causes an error like:

missing required field, TerminateInstancesInput.InstanceIds

This causes the destroy to get stuck, so instead handle the
case where the list is empty.

Make HyperShift operator compatible with previous CPOs without utilities

Adds symlinks in HyperShift operator image for previously standalone
utilities and enables the control plane operator to invoke the right
command based on the symlink it was invoked with.

Checks whether the CPO for a given cluster includes subcommands for
utilities by checking the labels on the CPO image, and only uses the
CPO image for utilities if it does include the labels.

continue to allow support for specification of clusterAutoscalerImage and machineApprover image until no longer necessary

IBM's production release candidate still relies on this behavior to deploy the images specified in the BOM. We will need to have this support in place until we can confirm we have successfully upgraded all production clusters to a later BOM version that no longer needs this behavior and instead uses the release image

Fix kubernetes.default for public clusters with proxy

On public clusters that use a http proxy, the kubernetes.default service
currently doesn't work, because it is served by a HAProxy that tries to
directly connect to the KAS, which isn't possible.

This change:
* Adds a new `kubernetes-default-proxy` subcommand to the CPO which is a
  simple tcp proxy that initiates a connection over http connect, as it
  doesn't seem that HAProxy supports this
* Uses a static pod with said subcommend on clusters with a https proxy
  that are not private

ensure release image annotation added to deployment template metadata to enable proper detection in IBM Cloud deployment process

The release image annotation should also be added to the pod template annotation so it is reflected on the pods that are scheduled from the deployment. This is necessary from a monitoring perspective and for detection in IBM Cloud's deployment process to ensure rollouts complete on a release image update

add validating webhook for HostedClusters

reconcile oauth serving cert rbac to allow oauth proxies in the openshift-monitoring namespace to work

This adds rbac that allows authenticated users to fetch the oauth-serving-cert configmap. This is necessary for the oauth proxies in the openshift-monitoring namespace to work and allow users access to the prometheus/grafana/alertmanager dashboards.

remove version check for api system that is not in 4.9

disable reconcile of registry config in IBMCloud deployments

Currently there is a race condition in IBMCloud deployments where if hypershift initializes the registry config before the managed service operation has a chance to properly initialize the registry config: It will result in a bad initial config that will cause the cluster-image-registry operator to crash and lead to tickets. For IBMCloud deployments: the initialization of this config should be delegated to the managed service to do the initalization so this does not occur. From there: the user can edit however they like in line with upstream openshift

Use forked processes instead of pods to generate ignition payload

Before this commit, ignition payloads for nodepools were generated by
orchestrating MCO components using pods. This was a performance problem because
ignition server restarts (e.g. during upgrades) resulted in thundering herds of
ignition payload generation pods, limiting overall cluster capacity.

This commit refactors ignition payload generation away from pods and to a
mechanism that extracts MCO binaries (and supporting files) from release
payloads so they can be forked as subprocesses of the ignition server
controller, removing all pod scheduling overhead. This change should be
completely transparent to the end user and is a drop-in replacement for the old
implementation.

The current implementation has the following limitations:

* Extracted payload contents are not cached across executions per release image.
  This is possible, but not worth doing unless subsequent performance analysis
  justifies it.

* All payload generation executions are serial. Concurrent executions are
  possible, but not worth doing unless subsequent performance analysis justifies
  it.

This commit also enables metrics collection for the ignition server component and
adds the following metrics:

* ign_server_token_rotation_total
* ign_server_payload_cache_miss_total
* ign_server_payload_generation_seconds
* ign_server_payload_cache_total

Ignition server: Actually use workdir

The defaulting was incorrect

configure cipher suites to prevent using medium strength ssl ciphers

We should only be using strong ciphers in hypershift components. By default Go includes medieum strength ciphers in the default tls config. This adjusts to only be strong ciphers (TLS 1.2 and TLS 1.3)

move to ga apis for all components now that management clusters at minimum release boundary

fix(cpo): Scope down secrets access for olm collect profiles cj

Update config.go

Added shutdown-send-retry-after and updated shutdown-delay-duration

feat(cpo): adhere to upgrade order from kube version skew policy

fix(cpo): set tls cipher suites flags on kcm and scheduler

Ensure cache is set during token rotation before reconciling

Add missing control plane prometheus rules

Ensure that everything uses imagePullPolicy IfNotPResent for resiliency

Ref https://issues.redhat.com/browse/HOSTEDCP-330

Updated secret permissions to conform to kubernetes CIS benchmark

Hypershift Image Pull Policy to IfNotPresent

feat(cpo): Support disable profiling annotation

Set Recommended Leader Election Values

Update control-plane-operator/controllers/hostedcontrolplane/scheduler/config_test.go

Co-authored-by: Hidematsu Sueki <[email protected]>

mend

Add fallback set cache value from old token

When you run multiple pods, by design the first one doing a token rotation in the secret CR forces the other two pods caches to generate a new payload for that token because they are not aware of it yet. This attempts to mitigate that scenario.

feat(oauth): allow challenge override for OpenID

cache registry files

adopt existing immutable selectors to prevent errors reconciling components from roks toolkit clusters

This fixes an error in the control plane operator where it will fail to reconcile and properly adopt deployments based on roks toolkit clusters. This is due to the fact that the selectors has changed and that is an immutable field. Specifically, hypershift selectors add a duplicate selector hypershit.component that is not contained in roks toolkit clusters. Even with the single app selector: deployment rollout behavior is unchanged. This allows for zero downtime adoption of roks toolkit clusters. Associated issue: openshift#1042

Use non-strict mode when parsing global config

The HyperShift operator uses the latest version of the OpenShift API to
serialize global config in the HCP (for compatibility with older CPOs).
The issue is that if there's a new field in the latest API that the
older CPO does not understand, we currently produce an error decoding
the YAML because the YAML serializer we use is using strict mode.

This commit switches to a YAML serializer that does have strict mode for
parsing of the global configuration. Making it possible for older CPOs
to parse YAML from the latest HyperShift operator.

change secret 420 to416

updt secret to 416

chg scrts perm to 416

chg scrts perm to 416

chg scrts perm to 416

chg scrts perm to 416

add pointer for scrt416 P

add pointer for scrt416 P

add pointer for scrt416

defaultmode 416 mcs-tls

416 mode on scrt

416 scr mode

change scrt DefaultMode to 416

DefaultMode to 416

Scrt DefaultMode416

Scrt DefaultMode416

change secret Defaultmode 420 to 416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants