-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[bridge] Marks stuck stopping and pending instances as stopped #13350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6850a78
to
f614e3d
Compare
started the job as gitpod-build-lau-pending-stopping-11397.4 because the annotations in the pull request description changed |
1ae156c
to
c54c7ca
Compare
8603bc6
to
9c6d7cc
Compare
continue; | ||
} | ||
|
||
log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laushinka Sorry for the long turnaround. I like the new layout of the loop! But we need it to include this case: If an instance in runningInstancesIdx
is running
, we still need to stop it unconditionally, as we do it at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense that we should handle the running
state as we do it now. Thanks, will add it!
09aa893
to
062a78c
Compare
Code LGTM, thx @laushinka ! 🙏 |
@geropl Cool! Let me know if the steps I described were helpful, or if you test in a different way 🙏🏽 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, tested and works as expected! 🥇
Let me know if the steps I described were helpful, or if you test in a different way
Did as you described, just had two meetings in between 😉
Description
After validating through logging of the past two weeks (since 15.09.2022), this change marks
stopping
andpending
states after the timeout duration to be stopped, as well asrunning
states that thews-manager
does not know about.Findings from the logging:
stopping
state, and we could not find a clear explanation whether they were cases that we want to be stopped. Instances could be in astopping
state for longer than we expect (in the previous PR it was 10 seconds afterstoppingTime
) because instances with a big back-up size can take longer.pending
states were stuck in that state for days, and we do want them stopped.creating
states that ended up running, and these we should not stop.We are still unsure why these instances end up in these states. Therefore we:
pending
andstopping
states after the timeout duration, andrunning
states. These we know for sure we want stopped ifws-manager
does not know about them.Related Issue(s)
Fixes #11397
How to test
kubectl cordon
the preview noded_b_workspace_instance
to 1 hour before (or wait 1 hour). It should be marked as stopped with a message"Stopped by ws-manager-bridge. Previously in phase pending
.kubectl uncordon
the node, otherwise builds will fail because the node can't be scheduled.Release Notes
Documentation
Werft options:
If enabled this will build
install/preview
Valid options are
all
,workspace
,webapp
,ide