Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backup): Add export checkpointer #80711

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

azaslavsky
Copy link
Contributor

This feature mirrors what we do for importing, where we periodically "save our work", so that if we experience an ephemeral failure (timeout, pod restart, OOM, etc), we can "pick up where we left off". For imports, we do this by saving ImportChunks to the database every time we import a few models, which allows us to check what we've already imported to avoiding redoing work when retrying.

We use a similar strategy here for exporting. For every model kind, we save a copy of the JSON of all instances of that model that we exported to some durable media in specially-named "checkpoint" files. If there is a failure midway through the export process, when we try again, we can scan for these files to quickly re-use them, rather than doing very expensive and resource intensive database queries again. While this does assume that the model state has stayed relatively consistent between runs, this is already an assumption we make for exporting in general (we can't export a "single snapshot in time" of the database at once anyway).

A follow-up PR will implement a subclass of ExportCheckpointer for GCP, which is what we will use to checkpoint large SaaS->SaaS relocations.

This feature mirrors what we do for importing, where we periodically
"save our work", so that if we experience an ephemeral failure (timeout,
pod restart, OOM, etc), we can "pick up where we left off". For imports,
we do this by saving `ImportChunk`s to the database every time we import
a few models, which allows us to check what we've already imported to
avoiding redoing work when retrying.

We use a similar strategy here for exporting. For every model kind, we
save a copy of the JSON of all instances of that model that we exported
to some durable media in specially-named "checkpoint" files. If there is
a failure midway through the export process, when we try again, we can
scan for these files to quickly re-use them, rather than doing very
expensive and resource intensive database queries again. While this does
assume that the model state has stayed relatively consistent between
runs, this is already an assumption we make for exporting in general (we
can't export a "single snapshot in time" of the database at once
anyway).

A follow-up PR will implement a subclass of `ExportCheckpointer` for
GCP, which is what we will use to checkpoint large SaaS->SaaS
relocations.
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 13, 2024
Comment on lines -32 to -37
def export_to_tmp_file_and_clear_database(self, tmp_dir, reset_pks) -> Path:
tmp_path = Path(tmp_dir).joinpath(f"{self._testMethodName}.expect.json")
export_to_file(tmp_path, ExportScope.Global)
clear_database(reset_pks=reset_pks)
return tmp_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused test helper method.

Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 87.93103% with 7 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/backup/exports.py 86.84% 3 Missing and 2 partials ⚠️
src/sentry/backup/crypto.py 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #80711      +/-   ##
==========================================
- Coverage   78.38%   78.38%   -0.01%     
==========================================
  Files        7201     7201              
  Lines      319135   319172      +37     
  Branches    43957    43961       +4     
==========================================
+ Hits       250158   250183      +25     
- Misses      62605    62614       +9     
- Partials     6372     6375       +3     

@azaslavsky azaslavsky marked this pull request as ready for review November 13, 2024 23:58
@azaslavsky azaslavsky requested a review from a team November 13, 2024 23:58
@azaslavsky azaslavsky enabled auto-merge (squash) November 13, 2024 23:58
Copy link
Member

@hubertdeng123 hubertdeng123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, how do we plan on checking the functionality of this CheckPointer in the future to simulate what is going on in prod?

@@ -68,6 +136,7 @@ def _export(
printer.echo(errText, err=True)
raise RuntimeError(errText)

cache = checkpointer if checkpointer is not None else NoopExportCheckpointer(None, printer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially, NoopExportCheckpointer is being used in the short term and this PR is establishing the framework for using ExportCheckpointer in the future?

@azaslavsky azaslavsky merged commit cebe401 into master Nov 15, 2024
50 of 51 checks passed
@azaslavsky azaslavsky deleted the azaslavsky/export-checkpointer branch November 15, 2024 00:36
azaslavsky added a commit that referenced this pull request Nov 15, 2024
This builds on the work of #80711 to add a GCP-backed
`ExportCheckpointer` implementation. Now, when exporting, we save
(always encrypted!) copies of the progress on each model kind seen so
far. If the export fails halfway through, we can use these checkpoints
to recover much more quickly then if we had to redo all of that work,
ensuring a higher chance of success on the retry.
azaslavsky added a commit that referenced this pull request Nov 18, 2024
This builds on the work of #80711 to add a GCP-backed
`ExportCheckpointer` implementation. Now, when exporting, we save
(always encrypted!) copies of the progress on each model kind seen so
far. If the export fails halfway through, we can use these checkpoints
to recover much more quickly then if we had to redo all of that work,
ensuring a higher chance of success on the retry.
azaslavsky added a commit that referenced this pull request Nov 18, 2024
This builds on the work of #80711 to add a GCP-backed
`ExportCheckpointer` implementation. Now, when exporting, we save
(always encrypted!) copies of the progress on each model kind seen so
far. If the export fails halfway through, we can use these checkpoints
to recover much more quickly then if we had to redo all of that work,
ensuring a higher chance of success on the retry.
@github-actions github-actions bot locked and limited conversation to collaborators Nov 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants