Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Add Channel to Operator Dependency Resolution #1557

Closed
cdjohnson opened this issue May 28, 2020 · 12 comments
Closed

[RFE] Add Channel to Operator Dependency Resolution #1557

cdjohnson opened this issue May 28, 2020 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@cdjohnson
Copy link

cdjohnson commented May 28, 2020

Feature Request

Problem

Today: All operators are resolved by GVK. If the API is stable among lots of versions and channels, the Operator that is created by an Automatic Subscription may not be the version that is Functionally compatible with the dependent operator.

There are other factors that go into satisfying the dependency:

  • Interoperability Testing (what slices of versions have been tested)
  • LTSR (are all of the operators Long-Term Service Releases)
  • SLA (dev, fast, candidate, stable channel preference)

Example channel taxonomy for IAM:
All versions have the same GVK.

V1.0-fast:  1.0.1, 1.0.2, 1.0.3
V1.0-stable: 1.0.3
V1.1-fast:  1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4
V1.1-stable (default) : 1.1.1, 1.1.2          
V1.1-lts (default) : 1.1.2          

The Operator Dependency Resolution enhancement adds to the GVK option by allowing an Operator Version to add a dependency on an Operator by Semver range.

This solves the problem when channels are divided properly in to X.Y channels, since you will always get the head of the correct channel.

Example of Semver Dependency declaration:

dependencies:
  - type: olm.package
    value:
      name: ibm-operator-iam
      version: >=1.1.2

In the above example three different channels satisfy the requirement

Example of GVK Dependency declaration:

dependencies:
  - type: olm.gvk
    value:
      group: iam.operator.ibm.com
      kind: Auth
      version: v1alpha1

In the above example, ALL of the channels satisfy the requirement.

This does not solve the other use cases where the version and API are not the only consideration.

Suggested Implementation

To make it generic in nature, use the Channel to describe the dependency. For example, if the dependent operator is in an LTSR channel, the IAM dependency could choose the "default" compatible LTSR channel.

in the MQ V9.5.0 Bundle:

channels: V9.5, V9.5-ltsr
defaultChannel: V9.5
dependencies:
  - type: olm.package
    value:
      name: ibm-operator-iam
      channel: V1.1-ltsr

@kevinrizza @shawn-hurley @ecordell @dinhxuanvu

@kevinrizza
Copy link
Member

Trying to follow along the logic here, isn't the dependency somewhat just orthogonal to the update channel that would be defined if you subscribed to that operator itself? When we resolve a dependency, that dependency doesn't automatically pull updates itself. It seems like, yes, while the version range >=1.1.2 can define a set of operators that are contained in multiple update channels, isn't ultimately what the operator author wants that they resolve their dependency with a version in that set?

Or is the actual request that when you install an operator, its dependencies automatically get upgrades inside a specific upgrade channel orthogonal from semantic version or GVK entirely?

@cdjohnson
Copy link
Author

It's the second: This is really about operator inception. When an Operator is initially installed and the dependent operator subscriptions are automatically created, the preferred default channel needs to be more explicit.

@ecordell
Copy link
Member

ecordell commented Jun 5, 2020

@cdjohnson Thanks for writing this up here. Just been taking some time to think about the problem.

I'm concerned that dependencies that include specific channels will over-constrain the resolver more often than not.

In other words, we'll never be able to resolve two constraints like this successfully:

dependencies:
  - type: olm.package
    name: ibm-operator-iam
    channel: V1.0-lts
---
dependencies:
  - type: olm.package
    name: ibm-operator-iam
    channel: V1.0-fast

even if there was some underling operator version present in both channels that could satisfy the real dependency being expressed.

There seem to be two ways that people are using channels right now:

  • release frequency (stable, beta, preview, fast, etc), where specific versions from faster channels are eventually promoted to the stable channel.
  • version selection (v1.1, v1.2, v1, v2 or tied to the platform like 4.3, 4.4) - This bounds automated upgrades to a specific version ranges and requires user input to switch between them once installed.

The ability to depend on version ranges of a particular package will make the version channels less appealing (there is no way to automatically bump a dependency between channels). But even when that happens there will be a set of packages/channels that will need to be handled from existing published packages.

I would propose that we support both of the ways channels are being used explicitly, with well-defined behavior for each.

This looks something like:

  1. If there are semver-like channels, they will be parsed as semver and operator versions from every semver channel will be available for resolution, but in semver order. That means that if A in channel v1.1 and B in channel v1.0 both satisfy the dependency, A will be chosen, and the subscription will be generated for channel v1.1
  2. If we see channels matching frequency names, operators will be available for resolution from each of those channels, but in a specific order. stable is preferred to preview, for example. (specific supported channel names and their relative order will be defined / documented)
  3. We allow specifying an arbitrary order for channels (syntax tbd). Each channel has an associated priority, which is used to order potential operators to satisfy dependencies.
  4. Otherwise, only packages in the default channel will be considered for resolution.

(This could be reduced to just the last 2 rules, with the first two being optimizations/enhancements on top).

The example above follows neither the semver-based ordering nor the frequency ordering (it's a mix of both). This will need to fall back to ordering channels manually:

0 - V1.1-lts: 1.1.2    
1 - V1.1-stable: 1.1.1, 1.1.2     
2 - V1.1-fast:  1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4   
3 - V1.0-stable: 1.0.3  
4 - V1.0-fast:  1.0.1, 1.0.2, 1.0.3

So that the dependency version: >=1.1.2 will pick 1.1.2 and subscribe to V1.1-lts.

Another example: version: >1.1.2 will pick 1.1.3 and subscribe to V1.1-fast channel.

@s-rogers
Copy link

s-rogers commented Jun 7, 2020

Just surfacing some internal discussions we've been having on this issue here to share how we are using dependencies and what the current behaviour means.

We have a set of operators which together form a "Cloud Pak". All of these operators have a common dependency on another operator - let's call that CS. They include this dependency in their CSV as a required CRD. Some operators have other dependencies - e.g. Couch.

These operators can be installed seperately or via an "uber-operator" which does not reconcile a CR as such but is purely a packaging exercise to include all the Cloud Pak operators as CSV dependencies for a single place a customer can install everything.

Here are a couple of examples we call out in an internal issue where not being able to control the version of the dependency causes us problems:

Example:

One of our operators - let's call it AR depends on the Couch operator. When we ship, their default channel is v3.0.0. A week later they release v4.0.0 (but do not change the CRD version) and change their default channel. All existing installs of AR will potentially be modified to upgrade to the new version and any new install will use v4.0.0 too. This is a major version release and is likely to cause issues with our existing code. Most notably - we were not in control of when to take this new version to give us time to test, fix accordingly.

Example 2:

We consume the CS operator via an OLM dependency. The default channel is "stable". We want to develop the next release of our operator against the next release of CS which is being provided under the "dev" channel. We cannot specify we want this in our dependency and will always pick up "stable".

We appreciate their are ways to workaround these by changing CRD versions etc. but the main point stands - we need to be able to control exactly what we are installing to guarantee behaviours for our customers

@cdjohnson
Copy link
Author

@ecordell Where do you think the "weight" rules would be applied? In the Package where the Default Channel defined?

Regarding Sam's use cases:
Example 1 (semver):
Since Couch (in your example) uses semver channels, presumably you can define the dependency on the operator version, which would override the default channel and choose the most appropriate one:

dependencies:
  - type: olm.package
    name: couch
    version: ">=3.0.0 <4.0.0

Example2:
Not sure how Evan's proposal solves this, since presumably the Weight is defined globally or in the package definition of the operator. Since packages and bundles are supposed to be immutable, it almost seems like you'd need some sort way of overriding channels or versions of leaf operators.

@ecordell
Copy link
Member

ecordell commented Jun 9, 2020

Where do you think the "weight" rules would be applied? In the Package where the Default Channel defined?

Leaving that somewhat unspecified for now so that we can discuss different options.

The default channel is "stable". We want to develop the next release of our operator against the next release of CS which is being provided under the "dev" channel. We cannot specify we want this in our dependency and will always pick up "stable".

Is it reasonable to specify a little more during development? You can create the operator subscription and the CS Subscription at the same time, and specify the dev channel up front for CS.

@darrensu-ibm
Copy link

Hi @ecordell have the options for this been discussed? I'm looking to see whether there is an outlook for it?

@cdjohnson
Copy link
Author

@darrensu-ibm We had a discussion at the operator-sdk olm dev sig meeting last week. See minutes.

@njhale volunteered to collect all the internal Red Hat documentation on this topic and publish it in a google doc for us to consume, so we can create the right taxonomy here. Once we have that, we'll need to decide if using channel restrictions is appropriate or not when handling dependency resolution for first-time installations, or if we need a different approach.

@exdx exdx added kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jun 23, 2020
@stale
Copy link

stale bot commented Aug 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 22, 2020
@openshift-ci-robot openshift-ci-robot added triage/unresolved Indicates an issue that can not or will not be resolved. and removed wontfix labels Aug 23, 2020
@stale
Copy link

stale bot commented Nov 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Mar 2, 2021

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contribution.
For more help on your issue, check out the olm-dev channel on the kubernetes slack [1] and the OLM Dev Working Group [2] [1] https://kubernetes.slack.com/archives/C0181L6JYQ2 [2] https://github.com/operator-framework/community#operator-lifecycle-manager-wg

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2021
@stale
Copy link

stale bot commented Mar 9, 2021

This issue has been automatically closed because it has not had any recent activity. Thank you for your contribution.

@stale stale bot closed this as completed Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

7 participants