-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Provide a Default Parameter for Fit's Checkpoint Restore Path #10573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @blisc I originally proposed the deprecation in #9405 . These are my opinions:
That means either:
Could you describe what this does and why it blocks calling |
Consider an average lightning training script: MyModel = LightningModules(**model_cfg)
trainer = Trainer(**trainer_cfg)
# Additional logic here to add logging and checkpoint that is common to all scripts
myfunction(trainer) # I want some ability to set a default checkpoint for fit
trainer.fit(model) # Note: We define our dataloaders in the model itself Since the training script is so similar, we just copy it for every model. Requiring a change in |
@blisc - it seems like the main issue is then around #9741 is the issue we filed to allow users the ability to register implementations, which the lightning trainer can then look up & add automatically for callbacks/loggers/profiler/plugins. Would that address your concern? However, In your example, I don't see why you couldn't look up the checkpoint path for fit from the trainer_cfg and pass it to |
I'm not asking for a custom plugin. I'm asking for lightning to continue supporting what it has in the past
I'm trying to avoid changing anything in Your main goal was to unify the lightning experience and you have already, All I am asking for is a parameter that is part of lightning trainer that can be used as the default if the relevant arguments are not passed to fit, validate, test, predict. You already have it for validate, test, predict. Can you just add fit as well, then everything will be unified! |
Could you share a code pointer for what you're looking for just so we're clear on the request? |
Discussed with @blisc - the request is to have a Is this supported behavior? trainer.validated_ckpt_path = "path"
trainer.validate(model, dataloaders) vs trainer.validate(model, dataloaders, ckpt_path="path") this seems more like a quirk of attributes on the Trainer that were inadvertently made public instead of private. @awaelchli @carmocca other checkpointing API experts, do you know? It's certainly not documented anywhere, and offering only the 2nd option feels more natural to support. |
Yikes! Imo that's a strange compromise. If we're now ok advertising two different ways to set the variable for our users, I'd rather us just introduce |
These attributes were basically there since the beginning. Their purpose originally was to serve as read-only to access programmatically the path that was loaded, i.e., when setting "best" as the checkpoint path. And it also served as a way to test the Trainer (no longer done this way).
This is not supposed to work, it is an unintended behavior and one should not rely on it, not today. |
Ok, it seems like you are leaning towards removing those attributes. Then can I suggest the following compromise:
There will only be two ways to set
This does not block any further refactor/unification that @ananthsub still has left to do. |
Dear @blisc,
Would you mind sharing how much of an effort it would be there? Additionally, here is a proposed vision for the Lightning Trainer API to become more modular, and here is Ananth's response about it: #10444 (comment). |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Happy new year, everyone! Can we add |
Happy new year to you too! The best way forward at the moment is to extend the removal of Similarly to the decision in #10410 (comment) |
That would be great |
Proposed refactor
Do 1 of 2 options:
resume_from_checkpoint
from Trainer's initialization as recommended in [checkpoint] Resolve 2 different checkpoint loading paths acrossfit
vsvalidate
/test
/predict
#9405ckpt_path
is passed as None to trainer.fit. This property should be accessible after trainer initializationMotivation
Our current design pattern is to initialization the Trainer object, and then inject our own logic into the Trainer class, and finally calling trainer.fit(model) without adding any additional arguments to fit(). Moving this argument from trainer's init to fit() will require us to rewrite all of our training scripts.
Pitch
See Proposed refactor
The text was updated successfully, but these errors were encountered: