-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[RFC] Deprecate should_rank_save_checkpoint
#9074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
help wanted
Open to be worked on
refactor
Comments
12 tasks
outlined some more with @ananthsub, draft proposal:
|
Thanks @ananthsub I like this clean up a lot! Regarding deprecation of the property I'm unsure. Personally I wouldn't mind a breaking change here since we do label the plugin/accelerator API as experimental iirc, and this variable is primarily used internally in Lightning. but cc @tchaton for any additional thoughts :) |
This was referenced Sep 8, 2021
12 tasks
12 tasks
9 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
help wanted
Open to be worked on
refactor
Proposed refactoring or deprecation
Now that the checkpoint is better consolidated in the training type plugin, we no longer need this property, as this becomes an internal implementation detail of the training type.
Users (via the checkpoint callback) only need to call
trainer.save_checkpoint
and assume the plugin will handle checks surrounding this for themMotivation
Pitch
Deprecate this property: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/trainer/properties.py#L117-L120
Deprecate this property: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/plugins/training_type/training_type_plugin.py#L313-L316
Move the directory creation from here: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/callbacks/model_checkpoint.py#L511-L514
Into here: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/plugins/io/torch_plugin.py#L30-L41
@SeanNaren - this is where we should also have a
rm_checkpoint
on the Checkpoint IO plugin such that we can deprecate this from the model checkpoint: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/callbacks/model_checkpoint.py#L502-L505One thing I'm not sure about: because the
should_rank_save_checkpoint
is exposed from the accelerator to the trainer as part of the public trainer API, does this need to go through a full deprecation cycle? Or is a breaking change as part of the plugins API permissible?Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered: