-
-
Notifications
You must be signed in to change notification settings - Fork 706
Prevent crashes on expected checkpoint cancellations #1324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes introduce two new error classes, Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
In some cases, expected checkpoint cancellations, e.g. when a dependency finishes before the parent checkpoint has been created, could cause the run to crash. This has been fixed by adding custom errors for all readiness interruptions. We now correctly distinguish between them and only crash on timeouts. If a run doesn't become ready to checkpoint within 20s, it's likely stuck forever and will be unable to resume in all scenarios.
Summary by CodeRabbit
New Features
CheckpointReadinessTimeoutError
andCheckpointCancelError
.Bug Fixes