Skip to content

[models] allow specifying block numbers in enable_gradient_checkpointing() to speed up training #10124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sayakpaul opened this issue Dec 5, 2024 · 3 comments · Fixed by #10611
Assignees
Labels
performance Anything related to performance improvements, profiling and benchmarking stale Issues that haven't received updates training

Comments

@sayakpaul
Copy link
Member

As brought up in #9982 (comment) by @bghira, I think we could support this directly in the enable_gradient_checkpointing() method we expose for the models.

Users could specify the block interval they want gradient checkpointing to be applied in and we take care of the rest. The code for this is simple and doesn't require any hacks.

Gradient checkpointing is a crucial component for training/fine-tuning larger models and this technique allows for nice speed/memory trade-off.

Cc: @a-r-r-o-w @hlky

@sayakpaul sayakpaul added training performance Anything related to performance improvements, profiling and benchmarking labels Dec 5, 2024
@a-r-r-o-w a-r-r-o-w self-assigned this Jan 2, 2025
@a-r-r-o-w
Copy link
Member

Taking this up soon. Will prio for next release

@bghira
Copy link
Contributor

bghira commented Jan 2, 2025

i've implemented this in simpletuner's modeling code for SDXL, Flux, SD3, and Sana, and even the smaller models benefit (maybe moreso) from having this option. this means that basically Sana goes from 4 seconds per step to 1.25 steps per second on a 24G card (int8) when training. I actually haven't checked the speedup for Flux but I assume on 80G card it'll be quite substantial.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Anything related to performance improvements, profiling and benchmarking stale Issues that haven't received updates training
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants