-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backup): Add export checkpointer #80711
Conversation
This feature mirrors what we do for importing, where we periodically "save our work", so that if we experience an ephemeral failure (timeout, pod restart, OOM, etc), we can "pick up where we left off". For imports, we do this by saving `ImportChunk`s to the database every time we import a few models, which allows us to check what we've already imported to avoiding redoing work when retrying. We use a similar strategy here for exporting. For every model kind, we save a copy of the JSON of all instances of that model that we exported to some durable media in specially-named "checkpoint" files. If there is a failure midway through the export process, when we try again, we can scan for these files to quickly re-use them, rather than doing very expensive and resource intensive database queries again. While this does assume that the model state has stayed relatively consistent between runs, this is already an assumption we make for exporting in general (we can't export a "single snapshot in time" of the database at once anyway). A follow-up PR will implement a subclass of `ExportCheckpointer` for GCP, which is what we will use to checkpoint large SaaS->SaaS relocations.
def export_to_tmp_file_and_clear_database(self, tmp_dir, reset_pks) -> Path: | ||
tmp_path = Path(tmp_dir).joinpath(f"{self._testMethodName}.expect.json") | ||
export_to_file(tmp_path, ExportScope.Global) | ||
clear_database(reset_pks=reset_pks) | ||
return tmp_path | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unused test helper method.
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #80711 +/- ##
==========================================
- Coverage 78.38% 78.38% -0.01%
==========================================
Files 7201 7201
Lines 319135 319172 +37
Branches 43957 43961 +4
==========================================
+ Hits 250158 250183 +25
- Misses 62605 62614 +9
- Partials 6372 6375 +3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, how do we plan on checking the functionality of this CheckPointer in the future to simulate what is going on in prod?
@@ -68,6 +136,7 @@ def _export( | |||
printer.echo(errText, err=True) | |||
raise RuntimeError(errText) | |||
|
|||
cache = checkpointer if checkpointer is not None else NoopExportCheckpointer(None, printer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So essentially, NoopExportCheckpointer is being used in the short term and this PR is establishing the framework for using ExportCheckpointer in the future?
This builds on the work of #80711 to add a GCP-backed `ExportCheckpointer` implementation. Now, when exporting, we save (always encrypted!) copies of the progress on each model kind seen so far. If the export fails halfway through, we can use these checkpoints to recover much more quickly then if we had to redo all of that work, ensuring a higher chance of success on the retry.
This builds on the work of #80711 to add a GCP-backed `ExportCheckpointer` implementation. Now, when exporting, we save (always encrypted!) copies of the progress on each model kind seen so far. If the export fails halfway through, we can use these checkpoints to recover much more quickly then if we had to redo all of that work, ensuring a higher chance of success on the retry.
This builds on the work of #80711 to add a GCP-backed `ExportCheckpointer` implementation. Now, when exporting, we save (always encrypted!) copies of the progress on each model kind seen so far. If the export fails halfway through, we can use these checkpoints to recover much more quickly then if we had to redo all of that work, ensuring a higher chance of success on the retry.
This feature mirrors what we do for importing, where we periodically "save our work", so that if we experience an ephemeral failure (timeout, pod restart, OOM, etc), we can "pick up where we left off". For imports, we do this by saving
ImportChunk
s to the database every time we import a few models, which allows us to check what we've already imported to avoiding redoing work when retrying.We use a similar strategy here for exporting. For every model kind, we save a copy of the JSON of all instances of that model that we exported to some durable media in specially-named "checkpoint" files. If there is a failure midway through the export process, when we try again, we can scan for these files to quickly re-use them, rather than doing very expensive and resource intensive database queries again. While this does assume that the model state has stayed relatively consistent between runs, this is already an assumption we make for exporting in general (we can't export a "single snapshot in time" of the database at once anyway).
A follow-up PR will implement a subclass of
ExportCheckpointer
for GCP, which is what we will use to checkpoint large SaaS->SaaS relocations.