feat(backup): Add export checkpointer #80711

azaslavsky · 2024-11-13T23:16:07Z

This feature mirrors what we do for importing, where we periodically "save our work", so that if we experience an ephemeral failure (timeout, pod restart, OOM, etc), we can "pick up where we left off". For imports, we do this by saving ImportChunks to the database every time we import a few models, which allows us to check what we've already imported to avoiding redoing work when retrying.

We use a similar strategy here for exporting. For every model kind, we save a copy of the JSON of all instances of that model that we exported to some durable media in specially-named "checkpoint" files. If there is a failure midway through the export process, when we try again, we can scan for these files to quickly re-use them, rather than doing very expensive and resource intensive database queries again. While this does assume that the model state has stayed relatively consistent between runs, this is already an assumption we make for exporting in general (we can't export a "single snapshot in time" of the database at once anyway).

A follow-up PR will implement a subclass of ExportCheckpointer for GCP, which is what we will use to checkpoint large SaaS->SaaS relocations.

This feature mirrors what we do for importing, where we periodically "save our work", so that if we experience an ephemeral failure (timeout, pod restart, OOM, etc), we can "pick up where we left off". For imports, we do this by saving `ImportChunk`s to the database every time we import a few models, which allows us to check what we've already imported to avoiding redoing work when retrying. We use a similar strategy here for exporting. For every model kind, we save a copy of the JSON of all instances of that model that we exported to some durable media in specially-named "checkpoint" files. If there is a failure midway through the export process, when we try again, we can scan for these files to quickly re-use them, rather than doing very expensive and resource intensive database queries again. While this does assume that the model state has stayed relatively consistent between runs, this is already an assumption we make for exporting in general (we can't export a "single snapshot in time" of the database at once anyway). A follow-up PR will implement a subclass of `ExportCheckpointer` for GCP, which is what we will use to checkpoint large SaaS->SaaS relocations.

azaslavsky · 2024-11-13T23:16:34Z

tests/sentry/backup/test_exhaustive.py

-    def export_to_tmp_file_and_clear_database(self, tmp_dir, reset_pks) -> Path:
-        tmp_path = Path(tmp_dir).joinpath(f"{self._testMethodName}.expect.json")
-        export_to_file(tmp_path, ExportScope.Global)
-        clear_database(reset_pks=reset_pks)
-        return tmp_path
-


Removed unused test helper method.

codecov · 2024-11-13T23:51:19Z

Codecov Report

Attention: Patch coverage is 87.93103% with 7 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/backup/exports.py	86.84%	3 Missing and 2 partials ⚠️
src/sentry/backup/crypto.py	85.71%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #80711      +/-   ##
==========================================
- Coverage   78.38%   78.38%   -0.01%     
==========================================
  Files        7201     7201              
  Lines      319135   319172      +37     
  Branches    43957    43961       +4     
==========================================
+ Hits       250158   250183      +25     
- Misses      62605    62614       +9     
- Partials     6372     6375       +3

hubertdeng123

Curious, how do we plan on checking the functionality of this CheckPointer in the future to simulate what is going on in prod?

hubertdeng123 · 2024-11-15T00:33:09Z

src/sentry/backup/exports.py

@@ -68,6 +136,7 @@ def _export(
        printer.echo(errText, err=True)
        raise RuntimeError(errText)

+    cache = checkpointer if checkpointer is not None else NoopExportCheckpointer(None, printer)


So essentially, NoopExportCheckpointer is being used in the short term and this PR is establishing the framework for using ExportCheckpointer in the future?

This builds on the work of #80711 to add a GCP-backed `ExportCheckpointer` implementation. Now, when exporting, we save (always encrypted!) copies of the progress on each model kind seen so far. If the export fails halfway through, we can use these checkpoints to recover much more quickly then if we had to redo all of that work, ensuring a higher chance of success on the retry.

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 13, 2024

azaslavsky commented Nov 13, 2024

View reviewed changes

azaslavsky requested a review from hubertdeng123 November 13, 2024 23:32

azaslavsky marked this pull request as ready for review November 13, 2024 23:58

azaslavsky requested a review from a team November 13, 2024 23:58

azaslavsky enabled auto-merge (squash) November 13, 2024 23:58

hubertdeng123 approved these changes Nov 15, 2024

View reviewed changes

azaslavsky merged commit cebe401 into master Nov 15, 2024
50 of 51 checks passed

azaslavsky deleted the azaslavsky/export-checkpointer branch November 15, 2024 00:36

azaslavsky mentioned this pull request Nov 15, 2024

feat(relocation): Add GCP-backed export checkpointer #80803

Merged

github-actions bot locked and limited conversation to collaborators Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backup): Add export checkpointer #80711

feat(backup): Add export checkpointer #80711

azaslavsky commented Nov 13, 2024

azaslavsky Nov 13, 2024

codecov bot commented Nov 13, 2024

hubertdeng123 left a comment

hubertdeng123 Nov 15, 2024

feat(backup): Add export checkpointer #80711

feat(backup): Add export checkpointer #80711

Conversation

azaslavsky commented Nov 13, 2024

azaslavsky Nov 13, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 13, 2024

Codecov Report

hubertdeng123 left a comment

Choose a reason for hiding this comment

hubertdeng123 Nov 15, 2024

Choose a reason for hiding this comment