Skip to content

[usage] Alert on Usage and Invoice Reconciliations #12919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 15, 2022

Conversation

easyCZ
Copy link
Member

@easyCZ easyCZ commented Sep 13, 2022

Description

Alert on the two core RPCs used in usage reconciliation.

Related Issue(s)

How to test

Release Notes

NONE

Documentation

Werft options:

  • /werft with-preview

@easyCZ easyCZ requested a review from a team September 13, 2022 13:16
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Sep 13, 2022
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/GitpodUsageScheduledReconciliationFailures.md
summary: There are failed scheduled reconciliations in the usage component.
description: We have accumulated {{ printf "%.2f" $value }} failures. This affects how stale usage data is and/or updating invoices in Stripe.
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/GitpodUsageReconcileUsageFailures.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the new name for this runbook?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll need to update the runbooks correspondingly. For now these would only be Slack warnings in webapp so we've some time to polish the runbooks before we get it to actually page

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@easyCZ easyCZ requested review from laushinka and a team September 14, 2022 12:16
Comment on lines +29 to +30
expr: sum(increase(grpc_server_handled_total{grpc_service="usage.v1.BillingService", grpc_method="ReconcileInvoices", grpc_code!="OK"})) > 1
for: 30m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm relatively new to PromQL, but what does the for: 30m do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says that the expr above needs to "have values" - be firing for 30m to fire an alert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does that mean we alert when reconciliation fails for 30 minutes continuously, ie just one failure won't alert? (not sure how often we run reconciliation in prod currently.)

Comment on lines +29 to +30
expr: sum(increase(grpc_server_handled_total{grpc_service="usage.v1.BillingService", grpc_method="ReconcileInvoices", grpc_code!="OK"})) > 1
for: 30m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does that mean we alert when reconciliation fails for 30 minutes continuously, ie just one failure won't alert? (not sure how often we run reconciliation in prod currently.)

@roboquat roboquat merged commit 2290935 into main Sep 15, 2022
@roboquat roboquat deleted the mp/usage-alerts-reconcile branch September 15, 2022 07:09
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note-none size/S team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants