Skip to content

feat(sub): Indicate resolution conflicts on Subscription statuses #2269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Jul 19, 2021

Description of the change:

Resolution considers everything in a given namespace as a unit, so conflicts aren't
generated on a per-Subscription basis. However, as the main entry point to resolution,
users first look at the Subscription to understand resolution failures. The set of conflicts
today is written to Events with type ResolutionFailed. This PR projects resolution errors onto
all subscriptions on the namespace for any resolution failure.

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot requested review from ankitathomas and benluddy July 19, 2021 21:25
@anik120 anik120 force-pushed the sub-resolution-error branch from 8ed8f6a to 66c9499 Compare July 19, 2021 21:26
@anik120
Copy link
Contributor Author

anik120 commented Jul 19, 2021

/hold

Made changes to the subscription api inside the vendor folder for reviews. PR for api change is here: operator-framework/api#138

will unhold once this and the api repo PR is lgtmed and changes from api is vendored in here.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 19, 2021
return err
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This is under the assumption that if there is a terminal resolution error, user intervention can resolve the error and get the subscriptions in the objects back to a good state. Is that a correct assumption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster admins can always nuke everything and recreate after the fact. there are also (rather gnarly) workarounds to pop an InstallPlan back into a non-terminal state for a retry (e.g. manually patch InstallPlan status).

@anik120 anik120 force-pushed the sub-resolution-error branch from 66c9499 to bc1d243 Compare July 21, 2021 15:02
@anik120
Copy link
Contributor Author

anik120 commented Jul 21, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2021
Copy link
Member

@njhale njhale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far.

@anik120 anik120 force-pushed the sub-resolution-error branch 2 times, most recently from 04ceee0 to ef12368 Compare July 21, 2021 15:52
Signed-off-by: Anik Bhattacharjee <[email protected]>
@anik120 anik120 force-pushed the sub-resolution-error branch from ef12368 to 7447829 Compare July 21, 2021 16:02
Copy link
Member

@njhale njhale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

My only real concern is in the lack of unit tests.

@anik120
Copy link
Contributor Author

anik120 commented Jul 21, 2021

@njhale there are tests here that tests that the method returns the correct value/error when resolver error occurs.
Example the NotSatisfiableError test checks that the sovler.NotSatisfiable error is returned when a constraint is not satisfyiable. And there's another unit test to test any resolver error that is not a constraint not satisfyiable error. So we have unit tests that covers all scenarios for resolver error during namespace sync, this PR just bubbles the errors up to the subscriptions, which needed the e2e test to test

@njhale
Copy link
Member

njhale commented Jul 21, 2021

@njhale there are tests here that tests that the method returns the correct value/error when resolver error occurs.
Example the NotSatisfiableError test checks that the sovler.NotSatisfiable error is returned when a constraint is not satisfyiable. And there's another unit test to test any resolver error that is not a constraint not satisfyiable error. So we have unit tests that covers all scenarios for resolver error during namespace sync, this PR just bubbles the errors up to the subscriptions, which needed the e2e test to test

hmm, I was thinking of the class of test that uses a mock client and tests a wider range of I/O (i.e. not just checking the error) -- but since the e2e test is so similar, I don't think it's worth holding so close to freeze.

/approve

@openshift-ci
Copy link

openshift-ci bot commented Jul 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2021
@@ -909,9 +909,22 @@ func (o *Operator) syncResolvingNamespace(obj interface{}) error {
// not-satisfiable error
if _, ok := err.(solver.NotSatisfiable); ok {
logger.WithError(err).Debug("resolution failed")
updateErr := o.setSubsCond(subs, v1alpha1.SubscriptionResolutionFailed, "Constraint not satisfyable", err.Error(), true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling error. this should say Constraint not satisfiable

@kevinrizza
Copy link
Member

minus that one nit, this looks good to me

@anik120 anik120 force-pushed the sub-resolution-error branch from 7447829 to 01054fd Compare July 22, 2021 13:07
@kevinrizza
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@@ -909,9 +909,22 @@ func (o *Operator) syncResolvingNamespace(obj interface{}) error {
// not-satisfiable error
if _, ok := err.(solver.NotSatisfiable); ok {
logger.WithError(err).Debug("resolution failed")
updateErr := o.setSubsCond(subs, v1alpha1.SubscriptionResolutionFailed, "Constraint not satisfiable", err.Error(), true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Reason usually intended to be machine-readable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

@anik120
Copy link
Contributor Author

anik120 commented Jul 22, 2021

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2021
Resolution considers everything in a given namespace as a unit, so conflicts aren't
generated on a per-Subscription basis. However, as the main entry point to resolution,
users first look at the Subscription to understand resolution failures. The set of conflicts
today is written to Events with type ResolutionFailed. This PR projects resolution errors onto
all subscriptions on the namespace for any resolution failure.

Signed-off-by: Anik Bhattacharjee <[email protected]>
@anik120 anik120 force-pushed the sub-resolution-error branch from 01054fd to 4a2c5c0 Compare July 22, 2021 14:08
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@anik120
Copy link
Contributor Author

anik120 commented Jul 22, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2021
@benluddy
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@openshift-merge-robot openshift-merge-robot merged commit 3223fce into operator-framework:master Jul 22, 2021
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Jul 28, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Aug 24, 2021
In operator-framework#2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>
openshift-merge-robot pushed a commit that referenced this pull request Aug 24, 2021
* fix(sub): Reset ResolutionFailed cond when error is resolved

In #2269, a new condition was introduced for Subscription to indicate any
dependency resolution error in it's message. However, when the case of
the error was resolved, the condition status (true) and the message
was sticking around. This PR fixes the issue, and makes sure that the
condition status is set to false, and the message and reason of the
condition are cleared off.

Signed-off-by: Anik Bhattacharjee <[email protected]>

* refractor updateStatusConditions to split calls to api server into it's own task

Signed-off-by: Anik Bhattacharjee <[email protected]>

* Reduce number of times subscription statues are updated.

The statuses of the subscriptions are updated multiple number of times
while syncing the resolving namespace. This commit switches to preserving
the state of the subscription statues instead, and only updating the
statuses on cluster only when it's neccessary.

Signed-off-by: Anik Bhattacharjee <[email protected]>

* remove unnecessary continue from loop

Signed-off-by: Anik Bhattacharjee <[email protected]>

* pass reference of sub to goroutine

Signed-off-by: Anik Bhattacharjee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants