-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1978662: Set a degraded message when persistent storage is not configured #1270
Bug 1978662: Set a degraded message when persistent storage is not configured #1270
Conversation
@fpetkovski: This pull request references Bugzilla bug 1978662, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
11a0934
to
b7dd975
Compare
/bugzilla refresh |
@fpetkovski: This pull request references Bugzilla bug 1978662, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
b7dd975
to
f79d2bb
Compare
@fpetkovski: This pull request references Bugzilla bug 1978662, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f79d2bb
to
740e370
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a nit and minor comment, rest lgtm :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues for this patch inline.
I'm wondering if we maybe should introduce a func (r *StatusReporter) SetDegraded
function for this and set it explicitly where appropriate. We talked before about setting degraded on other conditions and generally improve the status reporting. Not needed for this PR but also wouldn't hurt I think.
pkg/client/status_reporter.go
Outdated
@@ -77,7 +77,7 @@ func (r *StatusReporter) SetDone() error { | |||
conditions := newConditions(co.Status, r.version, time) | |||
conditions.setCondition(v1.OperatorAvailable, v1.ConditionTrue, "Successfully rolled out the stack.", "RollOutDone", time) | |||
conditions.setCondition(v1.OperatorProgressing, v1.ConditionFalse, "", "", time) | |||
conditions.setCondition(v1.OperatorDegraded, v1.ConditionFalse, "", "", time) | |||
conditions.setCondition(v1.OperatorDegraded, v1.ConditionFalse, degradedConditionMessage, "", time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should set the actual condition here depending on the message passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this should add a reason. Maybe PrometheusNoDataPersistence
or something like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should set the actual condition here depending on the message passed?
Do you mean we should set the condition to true
when a message other than ""
is passed?
I probably should have provided context in the PR description, the requirement is to actually keep the condition to degraded=false
but set a message for cluster admins to be aware of the state. I update the PR description to contain that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this should add a reason. Maybe PrometheusNoDataPersistence or something like it
Good point, added the reason as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we should set the condition to
true
when a message other than""
is passed?
Yes that is the convention afaiu, at least in the other direction (don't set the condition without also setting a reason). To me it makes sense too, but I'll try to find some according docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking we could not find any documentation on this particular topic. Is it okay for an operator to set a message for the degraded
condition but keep the value to false
. This is what we discussed for CMO when there is no persistent storage enabled to indicate to the admin that metrics might be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Trevor's reply in slack, setting a message for the degraded
condition while keeping the value to false
is a valid state for the operator status. A good practice in these situations is to add a link to the documentation on how to fix the underlying problem.
@fpetkovski: This pull request references Bugzilla bug 1978662, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
740e370
to
9bc2b96
Compare
@jan--f I agree that the workaround is not ideal, but also I think reworking the status reporting is a bit outside of the scope of what is requested from us in this BZ. I don't think introducing a What we might need is some sort of a status builder, or some other mechanism to decouple the status reporter from the status itself. Right now both the status state and the status update is handled through the same abstraction. Do you know if we already have a ticket for reworking the status reporting? I would be more than happy to pair on that one and see how we can improve the current situation. |
I don't think we have an overarching ticket. But agreed, this is really not in scope for this ticket. |
I don't find any ticket to rewrite the status reporter part but I agree that it can be improved (I presume that the code originates from times when the ClusterOperator semantics weren't well defined). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/hold until we get confirmation from @wking |
9bc2b96
to
f1ed7fb
Compare
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/retest |
1 similar comment
/retest |
/lgtm |
fed1807
to
ca5f932
Compare
Co-authored-by: Simon Pasquier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arajkumar, fpetkovski, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@fpetkovski: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@fpetkovski: All pull requests linked via external trackers have merged: Bugzilla bug 1978662 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When persistent storage is not configured for prometheus, upgrades and cluster disruptions can lead to data loss.
In order to make cluster admins aware of this problem, this PR adds a message to the degraded condition indicating this problem. The value of the condition is kept as
degraded=false
since there is no real degradation with the cluster monitoring.