[bridge] Marks stuck stopping and pending instances as stopped #13350

laushinka · 2022-09-27T08:05:32Z

Description

After validating through logging of the past two weeks (since 15.09.2022), this change marks stopping and pending states after the timeout duration to be stopped, as well as running states that the ws-manager does not know about.
Findings from the logging:

Most of the logs were for stopping state, and we could not find a clear explanation whether they were cases that we want to be stopped. Instances could be in a stopping state for longer than we expect (in the previous PR it was 10 seconds after stoppingTime) because instances with a big back-up size can take longer.
A lot of the pending states were stuck in that state for days, and we do want them stopped.
We found a few creating states that ended up running, and these we should not stop.

We are still unsure why these instances end up in these states. Therefore we:

Will focus on only stopping stuck instances in pending and stopping states after the timeout duration, and running states. These we know for sure we want stopped if ws-manager does not know about them.
To understand the why, we will create an issue.

Related Issue(s)

Fixes #11397

How to test

kubectl cordon the preview node
Run a workspace. It should be stuck in pending.
Change the creationTime in d_b_workspace_instance to 1 hour before (or wait 1 hour). It should be marked as stopped with a message "Stopped by ws-manager-bridge. Previously in phase pending.
Once done, kubectl uncordon the node, otherwise builds will fail because the node can't be scheduled.

Release Notes

NONE

Documentation

Werft options:

/werft with-local-preview
If enabled this will build install/preview
/werft with-preview
/werft with-integration-tests=all
Valid options are all, workspace, webapp, ide

werft-gitpod-dev-com · 2022-09-27T08:36:53Z

started the job as gitpod-build-lau-pending-stopping-11397.4 because the annotations in the pull request description changed
(with .werft/ from main)

components/ws-manager-bridge/src/bridge.ts

geropl · 2022-09-29T07:18:52Z

components/ws-manager-bridge/src/bridge.ts

                    continue;
                }

-                log.info(


@laushinka Sorry for the long turnaround. I like the new layout of the loop! But we need it to include this case: If an instance in runningInstancesIdx is running, we still need to stop it unconditionally, as we do it at the moment.

Makes sense that we should handle the running state as we do it now. Thanks, will add it!

components/ws-manager-bridge/src/bridge.ts

components/ws-manager-bridge/src/config.ts

geropl · 2022-09-29T11:22:37Z

Code LGTM, thx @laushinka ! 🙏
Will test now...

laushinka · 2022-09-29T12:50:28Z

Code LGTM, thx @laushinka ! 🙏 Will test now...

@geropl Cool! Let me know if the steps I described were helpful, or if you test in a different way 🙏🏽

geropl

Code LGTM, tested and works as expected! 🥇

Let me know if the steps I described were helpful, or if you test in a different way

Did as you described, just had two meetings in between 😉

roboquat added do-not-merge/work-in-progress do-not-merge/release-note-label-needed size/M labels Sep 27, 2022

laushinka force-pushed the lau/pending-stopping-11397 branch from 6850a78 to f614e3d Compare September 27, 2022 08:30

laushinka force-pushed the lau/pending-stopping-11397 branch 2 times, most recently from 1ae156c to c54c7ca Compare September 27, 2022 10:04

roboquat added release-note-none and removed do-not-merge/release-note-label-needed labels Sep 27, 2022

laushinka marked this pull request as ready for review September 27, 2022 10:36

laushinka requested a review from a team September 27, 2022 10:36

roboquat removed the do-not-merge/work-in-progress label Sep 27, 2022

laushinka requested a review from geropl September 27, 2022 10:36

github-actions bot added the team: webapp Issue belongs to the WebApp team label Sep 27, 2022

geropl reviewed Sep 27, 2022

View reviewed changes

components/ws-manager-bridge/src/bridge.ts Outdated Show resolved Hide resolved

geropl reviewed Sep 27, 2022

View reviewed changes

components/ws-manager-bridge/src/bridge.ts Show resolved Hide resolved

laushinka force-pushed the lau/pending-stopping-11397 branch from 8603bc6 to 9c6d7cc Compare September 27, 2022 14:53

laushinka requested a review from geropl September 28, 2022 09:45

geropl reviewed Sep 29, 2022

View reviewed changes

components/ws-manager-bridge/src/bridge.ts Outdated Show resolved Hide resolved

geropl reviewed Sep 29, 2022

View reviewed changes

components/ws-manager-bridge/src/bridge.ts Outdated Show resolved Hide resolved

geropl reviewed Sep 29, 2022

View reviewed changes

components/ws-manager-bridge/src/config.ts Show resolved Hide resolved

[bridge] Mark as stopped pending and stopping

062a78c

laushinka force-pushed the lau/pending-stopping-11397 branch from 09aa893 to 062a78c Compare September 29, 2022 09:19

laushinka requested a review from geropl September 29, 2022 10:37

geropl approved these changes Sep 29, 2022

View reviewed changes

roboquat merged commit e00bffa into main Sep 29, 2022

roboquat deleted the lau/pending-stopping-11397 branch September 29, 2022 13:16

roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bridge] Marks stuck stopping and pending instances as stopped #13350

[bridge] Marks stuck stopping and pending instances as stopped #13350

laushinka commented Sep 27, 2022 •

edited

Loading

werft-gitpod-dev-com bot commented Sep 27, 2022

geropl Sep 29, 2022

laushinka Sep 29, 2022

geropl commented Sep 29, 2022

laushinka commented Sep 29, 2022 •

edited by werft-gitpod-dev-com bot

Loading

geropl left a comment •

edited

Loading

[bridge] Marks stuck stopping and pending instances as stopped #13350

[bridge] Marks stuck stopping and pending instances as stopped #13350

Conversation

laushinka commented Sep 27, 2022 • edited Loading

Description

Related Issue(s)

How to test

Release Notes

Documentation

Werft options:

werft-gitpod-dev-com bot commented Sep 27, 2022

geropl Sep 29, 2022

Choose a reason for hiding this comment

laushinka Sep 29, 2022

Choose a reason for hiding this comment

geropl commented Sep 29, 2022

laushinka commented Sep 29, 2022 • edited by werft-gitpod-dev-com bot Loading

geropl left a comment • edited Loading

Choose a reason for hiding this comment

laushinka commented Sep 27, 2022 •

edited

Loading

laushinka commented Sep 29, 2022 •

edited by werft-gitpod-dev-com bot

Loading

geropl left a comment •

edited

Loading