Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: alertmanager conditional log gathering #545

Merged

Conversation

rluders
Copy link
Contributor

@rluders rluders commented Nov 11, 2021

This PR implements a new conditional gathering to collect Alertmanager logs when AlertmanagerClusterFailedToSendAlerts or AlertmanagerFailedToSendAlerts events are fired.

Note: I'm also including on this PR a small adjustment to the README.md by adding some break lines (easier to read), and one small fix to tools/gen_cert_key.py

How to test it

  1. Get a cluster
  2. Access the console
  3. Go to: Administration > Cluster Settings > Global Configuration > Alertmanager
  4. Configure the integration type for Critical and Default to use Webhook, and add any invalid address to the URL

It may take some time until the event gets fired, you can follow it up from Monitoring > Alert

Categories

  • Bugfix
  • Enhancement
  • Backporting
  • Others (CI, Infrastructure, Documentation)

Sample Archive

  • docs/insights-archive-sample/conditional/namespaces/openshift-monitoring/pods/alertmanager-main-0/containers/logs/alertmanager-alertmanagerfailedtosendalerts.log

Documentation

  • docs/gathered-data.md

Unit Tests

  • pkg/gatherers/conditional/gather_alertmanager_logs_test.go

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

Breaking Changes

No

References

https://issues.redhat.com/browse/CCXDEV-6036

@openshift-ci
Copy link

openshift-ci bot commented Nov 11, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2021
@openshift-ci
Copy link

openshift-ci bot commented Nov 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rluders

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2021
@rluders rluders force-pushed the ccxdev-6036-alertmanager-logs branch from 130db6b to 574c25c Compare November 16, 2021 09:19
@rluders rluders force-pushed the ccxdev-6036-alertmanager-logs branch from 6ef73be to b4f7903 Compare November 18, 2021 14:55
@rluders rluders marked this pull request as ready for review November 18, 2021 15:00
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2021
@@ -79,6 +81,7 @@ var defaultGatheringRules = []GatheringRule{
},
},
},
// GatherAPIRequestCounts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I added these comments to help to identify the conditions for each gather. It was a little bit hard for me to read it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and I agree that these lists tend to be difficult to read, especially as they grow longer.

Copy link
Contributor

@natiiix natiiix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from a couple of small details, I don't see any particular issue with this PR. Please let me know if you want to make any of the suggested changes. Otherwise, I could probably approve it as is.

Comment on lines 136 to 151
{
Conditions: []ConditionWithParams{
{
Type: AlertIsFiring,
Alert: &AlertConditionParams{
Name: "AlertmanagerClusterFailedToSendAlerts",
},
},
},
GatheringFunctions: GatheringFunctions{
GatherAlertmanagerLogs: GatherAlertmanagerLogsParams{
AlertName: "AlertmanagerClusterFailedToSendAlerts",
TailLines: 50,
},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that the conditional gatherer doesn't pass the alert name to the gatherer function is some way because it leads to this ugly code duplication that would be very easy to accidentally mess up (forgetting to set both alert name strings when copy-pasting). Not an issue with this PR, just a general note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I was thinking exactly the same when I was implementing it. Maybe it would be a good idea to create a task to look for a better approach. What do you think?

@rluders rluders requested review from tremes and natiiix November 24, 2021 14:25
@rluders
Copy link
Contributor Author

rluders commented Nov 25, 2021

/retest

@tremes
Copy link
Contributor

tremes commented Nov 25, 2021

I experimented with this one and reviewed couple of times. Looks good! Thank you!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@JoaoFula
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 29, 2021
@sferich888
Copy link

/label px-approved

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Nov 29, 2021
@xJustin
Copy link
Contributor

xJustin commented Nov 30, 2021

/label docs-approved

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Nov 30, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

13 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@rluders
Copy link
Contributor Author

rluders commented Dec 1, 2021

/retest

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 0dcd7f1 into openshift:master Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants