Skip to content

OCPBUGS-6016: UpdateStrategy RegistryPoll with nil Interval #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dtfranz
Copy link
Contributor

@dtfranz dtfranz commented Mar 9, 2023

Adds protection against a nil-pointer panic when an UpdateStrategy with nil value for RegistryPoll Interval is supplied. Also adds a unit test to ensure that the code does not panic but instead returns an error when this situation is encountered.

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 4bae06a5d2230e62328f5d82ce36c09a7522cea8

Bug: OCPBUGS-6016

Adds protection against a nil-pointer panic when an UpdateStrategy with nil value for RegistryPoll Interval is supplied. Also adds a unit test to ensure that the code does not panic but instead returns an error when this situation is encountered.

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 4bae06a5d2230e62328f5d82ce36c09a7522cea8

Signed-off-by: dtfranz <[email protected]>
@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 9, 2023
@openshift-ci-robot
Copy link

@dtfranz: This pull request references Jira Issue OCPBUGS-6016, which is invalid:

  • expected the bug to target the "4.14.0" version, but it targets "4.13.0" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Adds protection against a nil-pointer panic when an UpdateStrategy with nil value for RegistryPoll Interval is supplied. Also adds a unit test to ensure that the code does not panic but instead returns an error when this situation is encountered.

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 4bae06a5d2230e62328f5d82ce36c09a7522cea8

Bug: OCPBUGS-6016

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 9, 2023

/retest

1 similar comment
@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 9, 2023

/retest

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 10, 2023

/restest

@bandrade
Copy link
Contributor

/retest

1 similar comment
@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 13, 2023

/retest

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 13, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 13, 2023
@openshift-ci-robot
Copy link

@dtfranz: This pull request references Jira Issue OCPBUGS-6016, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 13, 2023

/retest

1 similar comment
@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 15, 2023

/retest

@dinhxuanvu
Copy link
Member

@dtfranz Please hold on the retest as this e2e-upgrade is failing for real due to some issues.

@dinhxuanvu
Copy link
Member

/hold
until e2e-upgrade is fixed (non-olm related)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 15, 2023
@bandrade
Copy link
Contributor

bandrade commented Mar 23, 2023

Hi @dtfranz ,

I'm still seeing the issue with the current commit. Would you mind to take a look?

oc exec olm-operator-567fb46d7d-dkg78 -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.0.0-4bec9bd968398f8b288ca9ca11dae5043debbd57
git commit: 4bec9bd968398f8b288ca9ca11dae5043debbd57

Create the catalog:

cat catalog2.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  generation: 1
  name: my-operator-catalog
  namespace: openshift-marketplace
  resourceVersion: "57394"
spec:
  displayName: My Operator Catalog
  image: nexus.appreg.prod.bi.com.gt:8443/redhat/redhat-operator-index:v4.10
  publisher: redhat
  sourceType: grpc
  updateStrategy: {}

oc create -f catalog2.yaml
catalogsource.operators.coreos.com/my-operator-catalog created

Package Server and Catalog Operator are still failing:

oc get pods -n openshift-operator-lifecycle-manager
NAME                                      READY   STATUS             RESTARTS     AGE
catalog-operator-56ff495cd7-hczh5         0/1     Error              1 (5s ago)   112m
collect-profiles-27993405-89jf2           0/1     Completed          0            38m
collect-profiles-27993420-n9qwm           0/1     Completed          0            23m
collect-profiles-27993435-86479           0/1     Completed          0            8m35s
olm-operator-567fb46d7d-dkg78             1/1     Running            0            112m
package-server-manager-75fcc4f8f6-pg2rq   1/1     Running            0            112m
packageserver-9ddf5bc4c-gn8zt             0/1     CrashLoopBackOff   1 (4s ago)   109m
packageserver-9ddf5bc4c-lcnsw             0/1     CrashLoopBackOff   1 (3s ago)   109m

oc logs catalog-operator-56ff495cd7-hczh5 -n openshift-operator-lifecycle-manager
time="2023-03-23T21:26:40Z" level=info msg="log level info"
time="2023-03-23T21:26:40Z" level=info msg="TLS keys set, using https for metrics"
W0323 21:26:40.034884       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2023-03-23T21:26:40Z" level=info msg="Using in-cluster kube client config"
time="2023-03-23T21:26:40Z" level=info msg="Using in-cluster kube client config"
W0323 21:26:40.035662       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0323 21:26:40.090420       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2023-03-23T21:26:40Z" level=info msg="connection established. cluster-version: v1.26.0-2849+dc93b13d1f6518-dirty"
time="2023-03-23T21:26:40Z" level=info msg="operator ready"
time="2023-03-23T21:26:40Z" level=info msg="starting informers..."
time="2023-03-23T21:26:40Z" level=info msg="informers started"
time="2023-03-23T21:26:40Z" level=info msg="waiting for caches to sync..."
E0323 21:26:40.113691       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 513 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1d8fc00?, 0x3632e00})
	/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2540be400?})
	/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 24, 2023

Hi @bandrade , thanks for checking on this.
The test you are running has a CatalogSource with an updateStrategy that is nil, however the code change here is meant only to address an updateStrategy that is non-nil and contains a nil registryPoll.

I've also confirmed from both the v0.23.1 and v0.24.0 upstream OLM releases that updateStrategy: {} will cause the catalog-operator pod to crash-loop. These two releases are before and after this code change respectively. I also tested this on a 4.12 cluster and encountered the same issue. It's safe to say that this is an existing issue not introduced by this code change, and I feel as though it should be addressed through another PR.

If that's acceptable to you then I'd like to merge this change and start a follow-up to fix the problem arising from updateStrategy: {}. But if not it will take some time to fix the other issue and pull that fix into this PR as well.

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 27, 2023

Looks like the e2e-upgrade test is passing again, gonna retest here.
/retest

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 27, 2023

/retest

@dinhxuanvu
Copy link
Member

The CI has passed.
/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2023
@perdasilva
Copy link
Contributor

/approve

@perdasilva
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 28, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtfranz, perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD edd42aa and 2 for PR HEAD d945ded in total

@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 28, 2023

/retest

1 similar comment
@dtfranz
Copy link
Contributor Author

dtfranz commented Mar 28, 2023

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2023

@dtfranz: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit d02dc2a into openshift:master Mar 29, 2023
@openshift-ci-robot
Copy link

@dtfranz: Jira Issue OCPBUGS-6016: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-6016 has been moved to the MODIFIED state.

In response to this:

Adds protection against a nil-pointer panic when an UpdateStrategy with nil value for RegistryPoll Interval is supplied. Also adds a unit test to ensure that the code does not panic but instead returns an error when this situation is encountered.

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 4bae06a5d2230e62328f5d82ce36c09a7522cea8

Bug: OCPBUGS-6016

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants