-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Remove our AdamW implementation #36177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4da47cc
to
6338b0c
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Ready for core maintainer review @ArthurZucker @Cyrilvallez |
src/transformers/optimization.py
Outdated
@@ -604,119 +604,14 @@ def scheduler_hook(param): | |||
) | |||
|
|||
|
|||
class AdamW(Optimizer): | |||
class AdamW(TorchAdamW): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as you are sure all paramrs are the same for init, let'sgo! 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also deprecate this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, actually - the old one was already deprecated! Maybe we should just remove it entirely, since we've been showing a deprecation warning for a long time now?
90dabf5
to
d99ad13
Compare
cc @ArthurZucker I cut it out entirely (it was raising a deprecation warning every time it was being used anyway). I refactored the references to it to use |
cc @ArthurZucker @Cyrilvallez I think this should be ready to go, but I'd like a core maintainer approval first! The plan is to just totally remove our AdamW class and redirect all the legacy references to it to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Rocketknight1! Thanks for cleaning up! 🙏 LGTM except that we should not keep both adanw_hf
and adamw_torch
as if they were 2 separate optimizers!
src/transformers/trainer.py
Outdated
elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED]: | ||
elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED, OptimizerNames.ADAMW_HF]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it is very confusing to keep adamw_hf
and adamw_torch
if they are now the same. Since our version of adamw was raising warnings for a long-time, I think that it should be safe to remove adamw_hf
entirely (let's remove it from the docstrings as well to avoid any confusion -- it is present in 2 docstrings).
Of course the best would be to remove the torch
part of all optimizers in OptimizerNames
as they are now all torch, but that would be breaking. But maybe something to do in a separate PR: change the names and do a whole deprecation cycle for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! There should be no more references to adamw_hf
in the code.
2d582e7
to
8caee1d
Compare
8caee1d
to
9913bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, LGTM! Thanks a lot!
* Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf
Transformers added an
AdamW
implementation before Torch supported it. However, Torch supports it now so there's not really much point in maintaining our own version!This PR deletes our
AdamW
class, but importstorch.optim.AdamW
in the same file, to ensure that imports that depended on it still work.Fixes #35504