Skip to content

Add a configurable escalation behaviour #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 13, 2023
Merged

Conversation

FranzBusch
Copy link
Contributor

Motivation

The service group is often running multiple services and orchestrates shutdown for them. We have seen that sometimes some services never shutdown nor do they respond to task cancellation properly. This can become quite problematic when the whole application is waiting for a service to cancel and otherwise appears to be healthy but in reality can't serve any traffic.

Modification

This PR adds a new configuration to escalate both graceful shutdown and cancellation. The escalation order is graceful shutdown -> task cancellation -> fatalError. The fatalError acts a last resort to make sure applications are never stuck.

# Motivation
The service group is often running multiple services and orchestrates shutdown for them. We have seen that sometimes some services never shutdown nor do they respond to task cancellation properly. This can become quite problematic when the whole application is waiting for a service to cancel and otherwise appears to be healthy but in reality can't serve any traffic.

# Modification
This PR adds a new configuration to escalate both graceful shutdown and cancellation. The escalation order is graceful shutdown -> task cancellation -> `fatalError`. The `fatalError` acts a last resort to make sure applications are never stuck.
@FranzBusch FranzBusch requested a review from glbrntt October 13, 2023 08:52
@@ -96,6 +96,65 @@ public struct ServiceGroupConfiguration: Sendable {
}
}

/// The group's escalation configuration.
public struct EscalationBehaviour: Sendable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the naming here that obvious, I think it needs some mention of shutdown/cancellation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I removed the struct and just made them top-level properties

Comment on lines 152 to 153
private var _maximumGracefulShutdownDuration: (secondsComponent: Int64, attosecondsComponent: Int64)?
private var _maximumCancellationDuration: (secondsComponent: Int64, attosecondsComponent: Int64)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this whole dance just to avoid availability on the type? Can't we just add the availability to the property on the configuration which uses it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not allowed to add availability on a stored property. So we have to do this dance somewhere.

@FranzBusch FranzBusch force-pushed the fb-escalation-behaviour branch 2 times, most recently from 8772634 to 8fb5c32 Compare October 13, 2023 09:37
@FranzBusch FranzBusch requested a review from glbrntt October 13, 2023 09:37
Copy link
Collaborator

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo nits

Comment on lines +674 to +678
try await Task.sleep(for: Duration(
secondsComponent: maximumCancellationDuration.secondsComponent,
attosecondsComponent: maximumCancellationDuration.attosecondsComponent
))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximumCancellationDuration is already a Duration here, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not. We are storing the raw components again because we cannot store a Duration due to the availability

Comment on lines 37 to 39
private var maximumGracefulShutdownDuration: (secondsComponent: Int64, attosecondsComponent: Int64)?
/// The maximum amount of time that task cancellation is allowed to take.
private var maximumCancellationDuration: (secondsComponent: Int64, attosecondsComponent: Int64)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these mutable? Also complete nit, but you'll save some repetition converting between the tuples and Duration if you define your own _Duration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the var. Was a bad copy and paste. I decided not to introduce a custom type for now. if we do more Duration based APIs we should come back to that.

Comment on lines 500 to 503
try await Task.sleep(for: Duration(
secondsComponent: maximumGracefulShutdownDuration.secondsComponent,
attosecondsComponent: maximumGracefulShutdownDuration.attosecondsComponent
))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maximumGracefulShutdownDuration is already a Duration here, no need to convert again

@FranzBusch FranzBusch force-pushed the fb-escalation-behaviour branch from 8fb5c32 to bc50f46 Compare October 13, 2023 12:06
@FranzBusch FranzBusch force-pushed the fb-escalation-behaviour branch from 287f4d1 to 06f9c21 Compare October 13, 2023 12:53
@FranzBusch FranzBusch enabled auto-merge (squash) October 13, 2023 12:58
@FranzBusch FranzBusch merged commit d673fdc into main Oct 13, 2023
@FranzBusch FranzBusch deleted the fb-escalation-behaviour branch October 13, 2023 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants