[ws-manager-bridge] Use more reasonable duration buckets for workspace instance updates #12798

jankeromnes · 2022-09-09T07:43:44Z

Description

Currently, we are tracking the duration of the handleStatusUpdate span, and storing it in the following buckets:

gitpod/components/ws-manager-bridge/src/prometheus-metrics-exporter.ts

Line 80 in 43e526b

buckets: prom.exponentialBuckets(2, 2, 8),

< 2 seconds
< 4 seconds
< 8 seconds
< 16 seconds
< 32 seconds
< 64 seconds
< 128 seconds
< 256 seconds
< +Infinity

However, most of the time, this span takes actually around 50ms:

Thus, all our measurements land in the very first "< 2 seconds" bucket, which is not very useful.

Instead, I propose that we store durations in the following buckets:

prom.exponentialBuckets(0.050, 2, 8)

< 50ms
< 100ms
< 200ms
< 400ms
< 800ms
< 1600ms
< 3200ms
< 6400ms
< +Infinity

Related Issue(s)

Fixes #

How to test

Release Notes

NONE

Documentation

Werft options:

/werft with-preview

…e instance updates

geropl

LGTM

easyCZ · 2022-09-09T07:50:58Z

How much more useful is the extra precision for below <2 secs? I'm asking because if there's no direct value (aside from more accurate data), we could drop the precision and maybe only have 2 buckets in the [0, 2] sec range.

For example, if our SLO was at 2 sec, there would actually be almost no value in making this change.

jankeromnes · 2022-09-09T07:55:08Z

Many thanks @geropl for the fast review! 🚀 (I hope this will make the deployment in 7min 😅)

@easyCZ Good point, maybe 8 buckets isn't actually needed here (I just fixed the existing bucket values, but didn't question the existing bucket count).

But I think we can do much better than a 2 seconds SLO for handling status updates. This is user-facing after all. Maybe something around 200ms or 400ms could be a better SLO for 99+% of updates (and still easily achievable).

[ws-manager-bridge] Use more reasonable duration buckets for workspac…

2270f39

…e instance updates

jankeromnes requested a review from a team September 9, 2022 07:43

roboquat added release-note-none size/XS labels Sep 9, 2022

github-actions bot added the team: webapp Issue belongs to the WebApp team label Sep 9, 2022

geropl approved these changes Sep 9, 2022

View reviewed changes

roboquat merged commit dadc064 into main Sep 9, 2022

roboquat deleted the jx/fix-duration-buckets branch September 9, 2022 07:51

roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ws-manager-bridge] Use more reasonable duration buckets for workspace instance updates #12798

[ws-manager-bridge] Use more reasonable duration buckets for workspace instance updates #12798

jankeromnes commented Sep 9, 2022

geropl left a comment

easyCZ commented Sep 9, 2022

jankeromnes commented Sep 9, 2022 •

edited by werft-gitpod-dev-com bot

Loading

[ws-manager-bridge] Use more reasonable duration buckets for workspace instance updates #12798

[ws-manager-bridge] Use more reasonable duration buckets for workspace instance updates #12798

Conversation

jankeromnes commented Sep 9, 2022

Description

Related Issue(s)

How to test

Release Notes

Documentation

Werft options:

geropl left a comment

Choose a reason for hiding this comment

easyCZ commented Sep 9, 2022

jankeromnes commented Sep 9, 2022 • edited by werft-gitpod-dev-com bot Loading

jankeromnes commented Sep 9, 2022 •

edited by werft-gitpod-dev-com bot

Loading