Skip to content

Remove our AdamW implementation #36177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 19, 2025
Merged

Remove our AdamW implementation #36177

merged 14 commits into from
Mar 19, 2025

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Feb 13, 2025

Transformers added an AdamW implementation before Torch supported it. However, Torch supports it now so there's not really much point in maintaining our own version!

This PR deletes our AdamW class, but imports torch.optim.AdamW in the same file, to ensure that imports that depended on it still work.

Fixes #35504

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member Author

Ready for core maintainer review @ArthurZucker @Cyrilvallez

@@ -604,119 +604,14 @@ def scheduler_hook(param):
)


class AdamW(Optimizer):
class AdamW(TorchAdamW):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as you are sure all paramrs are the same for init, let'sgo! 🤗

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also deprecate this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, actually - the old one was already deprecated! Maybe we should just remove it entirely, since we've been showing a deprecation warning for a long time now?

@Rocketknight1 Rocketknight1 force-pushed the no-more-adamw branch 2 times, most recently from 90dabf5 to d99ad13 Compare February 20, 2025 19:36
@Rocketknight1
Copy link
Member Author

cc @ArthurZucker I cut it out entirely (it was raising a deprecation warning every time it was being used anyway). I refactored the references to it to use torch.optim.AdamW instead.

@Rocketknight1
Copy link
Member Author

cc @ArthurZucker @Cyrilvallez I think this should be ready to go, but I'd like a core maintainer approval first!

The plan is to just totally remove our AdamW class and redirect all the legacy references to it to use torch.optim.AdamW instead. The class has already been throwing a deprecation warning for some time, so I think people have had enough notice by now.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Rocketknight1! Thanks for cleaning up! 🙏 LGTM except that we should not keep both adanw_hf and adamw_torch as if they were 2 separate optimizers!

Comment on lines 1426 to 1421
elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED]:
elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED, OptimizerNames.ADAMW_HF]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is very confusing to keep adamw_hf and adamw_torch if they are now the same. Since our version of adamw was raising warnings for a long-time, I think that it should be safe to remove adamw_hf entirely (let's remove it from the docstrings as well to avoid any confusion -- it is present in 2 docstrings).

Of course the best would be to remove the torch part of all optimizers in OptimizerNames as they are now all torch, but that would be breaking. But maybe something to do in a separate PR: change the names and do a whole deprecation cycle for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! There should be no more references to adamw_hf in the code.

@Rocketknight1 Rocketknight1 force-pushed the no-more-adamw branch 4 times, most recently from 2d582e7 to 8caee1d Compare March 14, 2025 15:31
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, LGTM! Thanks a lot!

@Rocketknight1 Rocketknight1 merged commit 9be4728 into main Mar 19, 2025
24 checks passed
@Rocketknight1 Rocketknight1 deleted the no-more-adamw branch March 19, 2025 18:29
@Rocketknight1 Rocketknight1 changed the title Just import torch AdamW instead Remove our AdamW implementation Mar 19, 2025
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Just import torch AdamW instead

* Update docs too

* Make AdamW undocumented

* make fixup

* Add a basic wrapper class

* Add it back to the docs

* Just remove AdamW entirely

* Remove some AdamW references

* Drop AdamW from the public init

* make fix-copies

* Cleanup some references

* make fixup

* Delete lots of transformers.AdamW references

* Remove extra references to adamw_hf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subtle difference with Pytorch AdamW?
4 participants