Skip to content

[FLAKY] cleanup csvs with bad owner operator groups #2644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
perdasilva opened this issue Feb 16, 2022 · 3 comments · Fixed by #2820
Closed

[FLAKY] cleanup csvs with bad owner operator groups #2644

perdasilva opened this issue Feb 16, 2022 · 3 comments · Fixed by #2820
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@perdasilva
Copy link
Collaborator

Description

Unfortunately, I marked this as a flake without capturing the stack trace ._.
Once I get it I'll update the issue. Or if this test fails for you, please paste in the stack trace and log link if possible.

@perdasilva perdasilva added the kind/flake Categorizes issue or PR as related to a flaky test. label Feb 16, 2022
@anik120
Copy link
Contributor

anik120 commented Jul 28, 2022

Not sure if this is the same error that shows up during a flake, but I'm seeing this consistently over one of my PRs so pasting this here:

• [FAILED] [608.186 seconds]
Operator Group
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:35
[It] [FLAKE] cleanup csvs with bad owner operator groups
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1803

Begin Captured GinkgoWriter Output >>
[BeforeEach] Operator Group
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:41
[It] [FLAKE] cleanup csvs with bad owner operator groups
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:1803
Creating CRD
Getting default operator group 'global-operators' installed via operatorgroup-default.yaml operators
Waiting on operator group to have correct status
Creating CSV
wait for CSV to succeed
():
Installing (InstallWaiting): installing: waiting for deployment operator-deploymentggb69 to become ready: deployment "operator-deploymentggb69" not available: Deployment does not have minimum availability.
Succeeded (InstallSucceeded): install strategy completed with no errors
wait for roles to be promoted to clusterroles
ensure operator was granted namespace list permission
Waiting for operator namespace csv to have annotations
Found CSV count of 7
Create other namespace operators-mmhm4
Waiting to ensure copied CSV shows up in other namespace
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
.
.
.
.
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
map[string]string{"olm.operatorGroup":"global-operators", "olm.operatorNamespace":"operators"}
operators-mmhm4
[AfterEach] Operator Group
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:46
cleaning up ephemeral test resources...
deleting test subscriptions...
deleting test installplans...
deleting test catalogsources...
deleting test crds...
deleting test csvs...
test resources deleted
<< End Captured GinkgoWriter Output

Error Trace:	operator_groups_e2e_test.go:[2041](https://github.com/operator-framework/operator-lifecycle-manager/runs/7547744827?check_suite_focus=true#step:5:2042)
            				suite.go:605
            				asm_amd64.s:1571
Error:      	Received unexpected error:
            	timed out waiting for the condition
Test:       	Operator Group [FLAKE] cleanup csvs with bad owner operator groups

In [It] at: /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/stretchr/testify/require/require.go:1261

Full Stack Trace
github.com/stretchr/testify/require.NoError({0x7f44305f1910, 0xc0011db3c0}, {0x3a5d9c0, 0xc0001cc080}, {0x0, 0x0, 0x0})
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/stretchr/testify/require/require.go:1261 +0x96
github.com/operator-framework/operator-lifecycle-manager/test/e2e.glob..func20.11()
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/operator_groups_e2e_test.go:2041 +0x41ec

@anik120
Copy link
Contributor

anik120 commented Jul 28, 2022

Update: I performed the test manually with an olm installation that has my commit in, and looks like there is no deviation from expected behavior being introduced by my commit. Documenting the steps for manual testing for future flake testing:

Test was introduced as part of a fix in #1267

Step 1: kind create cluster; make run-local (from branch that includes new commit)

Step 2: Create namespace:

$ kubectl apply -f -<< EOF
apiVersion: v1
kind: Namespace
metadata:
  name: second-operators
   
EOF
namespace/second-operators created

Step 3: Create Operator in new namespace, that targets all namespaces:

$ kubectl apply -f -<< EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: olm-og
  namespace: second-operators 
  
EOF
operatorgroup.operators.coreos.com/olm-og created

Step 4: Subscribe to the elastic-cloud-eck package in the new namespace, and wait for CSV to be copied to all namespaces:

$ kubectl apply -f -<< EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: elastic-sub
  namespace: second-operators
spec:
  channel: stable
  installPlanApproval: Automatic
  name: elastic-cloud-eck
  source: operatorhubio-catalog
  sourceNamespace: olm

EOF
subscription.operators.coreos.com/elastic-sub created

$ kubectl get csv -A
NAMESPACE            NAME                       DISPLAY                        VERSION   REPLACES                   PHASE
default              elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-node-lease      elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-public          elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-system          elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
local-path-storage   elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
olm                  elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
olm                  packageserver              Package Server                 1.0.0                                Succeeded
operators            elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
second-operators     elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded

Step 5: Edit the csv elastic-cloud-eck.v2.3.0's annotation:

// (TODO): come up with a kubectl patch command for this
the olm.operatorNamespace annotation has a value of second-operators, change that to operators or any garbage value.

Step 6: Check to make sure that the CSV wes deleted from the new namespace:

$ kubectl get csv -A                             
NAMESPACE            NAME                       DISPLAY                        VERSION   REPLACES                   PHASE
default              elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-node-lease      elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-public          elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
kube-system          elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
local-path-storage   elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
olm                  elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded
olm                  packageserver              Package Server                 1.0.0                                Succeeded
second-operators     elastic-cloud-eck.v2.3.0   Elasticsearch (ECK) Operator   2.3.0     elastic-cloud-eck.v2.2.0   Succeeded

The CSV is gone from the operators namespace

The reason that the test is flaking is coz olm controller copies the CSV back into the operators namespace, so whenever the test was able to "fetch csv in namespace, oh it's gone" in that small window before the controller copies it again, the test was passing 😵‍💫

@anik120
Copy link
Contributor

anik120 commented Jul 28, 2022

Fixed this flake in 8eb6517

Will close this when that PR merged 🙇🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants