Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLM 0.16.1 More than one Head for channel when using opm --semver-skippatch #1771

Closed
cdjohnson opened this issue Sep 19, 2020 · 6 comments
Closed
Labels
triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@cdjohnson
Copy link

Bug Report

The operator-registry project added the ability to add operator bundles to a catalog index image DAG using semantic versioning rather than the CSV replaces option.

Although this works and the graph looks correct, it appears that OLM is still looking for the replaces value.

What did you do?

  1. Installed OCP 4.6 nightly build 9/17/2020 which has OLM 0.16.1.
  2. Enabled debug on catalog-operator
  3. Created a catalog index image with the following operator bundle taxonomy using the command:

opm registry add --mode semver-skippatch

Package Version Channel replaces olm.skipRange olm.package
testoperatora 1.0.0 v1.0 none none none
testoperatora 1.0.1 v1.0 none <1.0.1 none
testoperatorb 4.0.0 v4.0 none none testoperatora: <1.3.0
testoperatorb 4.0.1 v4.0 none <4.0.1 testoperatora: <1.3.0
  1. Created the catalog source
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: depcatalogs
  namespace: openshift-marketplace
spec:
  displayName: "depcatalogs-semver-dep-nowork Operators"
  image: docker.io/cdjohnson/depcatalogs@sha256:78d736b0d3c3d288d3000eea0c1640441031357f77ed4215f78e661aabc108ae
  publisher: IBM
  sourceType: grpc
EOF
  1. Create namespace deptest
  2. Create a own-namespace operator group
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: operatorgroup
spec:
  targetNamespaces:
  - deptest
EOF
  1. Create a subscription for operator b that has a dependency on A via the dependency.yaml
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: testoperatorb
spec:
  source: depcatalogs
  sourceNamespace: openshift-marketplace

  channel: v4.0
  installPlanApproval: Automatic
  name: testoperatorb
  startingCSV: testoperatorb.v4.0.0
EOF

What did you expect to see?
I expected both operators A and B to be installed.

What did you see instead? Under which circumstances?
Operator B failed to install.

Debug log shows:

time="2020-09-19T20:47:15Z" level=debug msg="resolving sources" id=fuCpC namespace=deptest
time="2020-09-19T20:47:15Z" level=debug msg="checking if subscriptions need update" id=fuCpC namespace=deptest
time="2020-09-19T20:47:15Z" level=debug msg="checking for existing installplan" channel=v4.0 id=fuCpC namespace=deptest pkg=testoperatorb source=depcatalogs sub=testoperatorb
time="2020-09-19T20:47:15Z" level=debug msg="resolving subscriptions in namespace" id=fuCpC namespace=deptest
E0919 20:47:15.544438       1 queueinformer_operator.go:290] sync "deptest" failed: found more than one head for channel
I0919 20:47:15.544560       1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"deptest", UID:"260c9530-4bf0-4c6c-af3c-02ec5c9df79d", APIVersion:"v1", ResourceVersion:"3609624", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' found more than one head for channel

The source code seems to indicate that replaces is needed:
https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/resolver/resolver.go#L549-L560

Environment

  • operator-lifecycle-manager version:
    0.16.1

  • Kubernetes version information:
    Client Version: 4.6.0-0.nightly-2020-09-15-112431
    Server Version: 4.6.0-0.nightly-2020-09-17-073141
    Kubernetes Version: v1.19.0+b4ffb45

  • Kubernetes cluster kind:
    OCP on aws

Possible Solution
Continue to add replaces to the CSVs.

Here is a catalog index image that is the same as the previous, but has replaces specified in addition to using opm registry add --mode semver-skippatch. This works fine. Both Operator A and B are installed

Package Version Channel replaces olm.skipRange olm.package
testoperatora 1.0.0 v1.0 none none none
testoperatora 1.0.1 v1.0 1.0.0 <1.0.1 none
testoperatorb 4.0.0 v4.0 none none testoperatora: <1.3.0
testoperatorb 4.0.1 v4.0 4.0.0 <4.0.1 testoperatora: <1.3.0

This is the catalog index image:
docker.io/cdjohnson/depcatalogs@sha256:7fbe51329cbaaee3882d72195f7471d964a2c61bd5d3fa164df6a64ee324cfa0

Additional context
Add any other context about the problem here.

@cdjohnson
Copy link
Author

OCP Bugzilla Cross-Reference
https://bugzilla.redhat.com/show_bug.cgi?id=1881220

@cdjohnson
Copy link
Author

Looks like this is a bug in operator-registry:
operator-framework/operator-registry#483

@stale
Copy link

stale bot commented Dec 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 27, 2020
@openshift-ci-robot openshift-ci-robot added triage/unresolved Indicates an issue that can not or will not be resolved. and removed wontfix labels Dec 28, 2020
@cdjohnson
Copy link
Author

@benluddy I've been trying to test this out with opm 1.15.3 which should have this fixed in it.

However, I'm seeing a problem where the LAST entry in a channel isn't showing the replaces show up, which results in the same error.

Here is the channel-entry table:

--channel-entry--
1|v1.1-eus|testoperatora|testoperatora.v1.1.1|2|1
2|v1.1-eus|testoperatora|testoperatora.v1.1.0||2
3|v1.0|testoperatora|testoperatora.v1.0.1|4|1
4|v1.0|testoperatora|testoperatora.v1.0.0||2
5|v1.2|testoperatora|testoperatora.v1.2.0||1
6|v4.1-eus|testoperatorb|testoperatorb.v4.1.1||1
7|v4.1-eus|testoperatorb|testoperatorb.v4.1.0||2
8|v4.1-eus|testoperatorb|testoperatorb.v4.1.1|7|2
9|v4.0|testoperatorb|testoperatorb.v4.0.1|10|1
10|v4.0|testoperatorb|testoperatorb.v4.0.0||2

It shows 4.1.1 replaces 4.1.0

Yet ListBundles shows:

grpcurl -plaintext localhost:50051 api.Registry/ListBundles | jq '{name: .csvName, version: .version, replaces: .replaces, skipRange: .skipRange, channelName: .channelName}' | jq -s -c 'sort_by(.name)[]'
{"name":"testoperatora.v1.0.0","version":"1.0.0","replaces":null,"skipRange":null,"channelName":"v1.0"}
{"name":"testoperatora.v1.0.1","version":"1.0.1","replaces":"testoperatora.v1.0.0","skipRange":"<1.0.1","channelName":"v1.0"}
{"name":"testoperatora.v1.1.0","version":"1.1.0","replaces":null,"skipRange":"<1.1.0","channelName":"v1.1-eus"}
{"name":"testoperatora.v1.1.1","version":"1.1.1","replaces":"testoperatora.v1.1.0","skipRange":"<1.1.1","channelName":"v1.1-eus"}
{"name":"testoperatora.v1.2.0","version":"1.2.0","replaces":null,"skipRange":"<1.2.0","channelName":"v1.2"}
{"name":"testoperatorb.v4.0.0","version":"4.0.0","replaces":null,"skipRange":null,"channelName":"v4.0"}
{"name":"testoperatorb.v4.0.1","version":"4.0.1","replaces":"testoperatorb.v4.0.0","skipRange":"<4.0.1","channelName":"v4.0"}
{"name":"testoperatorb.v4.1.0","version":"4.1.0","replaces":null,"skipRange":"<4.1.0","channelName":"v4.1-eus"}
{"name":"testoperatorb.v4.1.1","version":"4.1.1","replaces":null,"skipRange":"<4.1.1","channelName":"v4.1-eus"}

replaces is null for v4.1.1

So, it looks like its partially fixed. With this catalog, testoperatora works, but testoperatorb fails with the original error.

Here's an example catalogsource:
docker.io/cdjohnson/depcatalogs@sha256:cbda9451ab3021cde905c910b1a4df946c9414c75f8a21147d8d431b32325b9f

@benluddy
Copy link
Contributor

benluddy commented Feb 3, 2021

In skippatch mode, the generated update edges are "skips" rather than "replaces" edges. Based on the contents of channel_entry, the ListBundles output looks correct if you include "skips" in your jq query:

{"packageName":"testoperatorb","csvName":"testoperatorb.v4.1.1","channelName":"v4.1-eus","replaces":null,"skips":["testoperatorb.v4.1.0"]}

I failed to reproduce the resolution error using your latest index image. Are there other CatalogSources or Subscriptions on your cluster that could explain it?

@cdjohnson
Copy link
Author

I think this is resolved. It looks like semver-skippatch isn't going to work for us because it doesn't handle cross-channel via skipRange. I think the problem I had here was related to trying to use the OpenShift registry which may have been an old version. The upstream image worked as designed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

3 participants