-
Notifications
You must be signed in to change notification settings - Fork 41
Add a configurable escalation behaviour #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Motivation The service group is often running multiple services and orchestrates shutdown for them. We have seen that sometimes some services never shutdown nor do they respond to task cancellation properly. This can become quite problematic when the whole application is waiting for a service to cancel and otherwise appears to be healthy but in reality can't serve any traffic. # Modification This PR adds a new configuration to escalate both graceful shutdown and cancellation. The escalation order is graceful shutdown -> task cancellation -> `fatalError`. The `fatalError` acts a last resort to make sure applications are never stuck.
@@ -96,6 +96,65 @@ public struct ServiceGroupConfiguration: Sendable { | |||
} | |||
} | |||
|
|||
/// The group's escalation configuration. | |||
public struct EscalationBehaviour: Sendable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the naming here that obvious, I think it needs some mention of shutdown/cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I removed the struct and just made them top-level properties
private var _maximumGracefulShutdownDuration: (secondsComponent: Int64, attosecondsComponent: Int64)? | ||
private var _maximumCancellationDuration: (secondsComponent: Int64, attosecondsComponent: Int64)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this whole dance just to avoid availability on the type? Can't we just add the availability to the property on the configuration which uses it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not allowed to add availability on a stored property. So we have to do this dance somewhere.
8772634
to
8fb5c32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo nits
try await Task.sleep(for: Duration( | ||
secondsComponent: maximumCancellationDuration.secondsComponent, | ||
attosecondsComponent: maximumCancellationDuration.attosecondsComponent | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maximumCancellationDuration
is already a Duration
here, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not. We are storing the raw components again because we cannot store a Duration
due to the availability
private var maximumGracefulShutdownDuration: (secondsComponent: Int64, attosecondsComponent: Int64)? | ||
/// The maximum amount of time that task cancellation is allowed to take. | ||
private var maximumCancellationDuration: (secondsComponent: Int64, attosecondsComponent: Int64)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these mutable? Also complete nit, but you'll save some repetition converting between the tuples and Duration
if you define your own _Duration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch with the var
. Was a bad copy and paste. I decided not to introduce a custom type for now. if we do more Duration
based APIs we should come back to that.
try await Task.sleep(for: Duration( | ||
secondsComponent: maximumGracefulShutdownDuration.secondsComponent, | ||
attosecondsComponent: maximumGracefulShutdownDuration.attosecondsComponent | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maximumGracefulShutdownDuration
is already a Duration
here, no need to convert again
8fb5c32
to
bc50f46
Compare
287f4d1
to
06f9c21
Compare
Motivation
The service group is often running multiple services and orchestrates shutdown for them. We have seen that sometimes some services never shutdown nor do they respond to task cancellation properly. This can become quite problematic when the whole application is waiting for a service to cancel and otherwise appears to be healthy but in reality can't serve any traffic.
Modification
This PR adds a new configuration to escalate both graceful shutdown and cancellation. The escalation order is graceful shutdown -> task cancellation ->
fatalError
. ThefatalError
acts a last resort to make sure applications are never stuck.