[models] allow specifying block numbers in `enable_gradient_checkpointing()` to speed up training #10124

sayakpaul · 2024-12-05T03:09:00Z

As brought up in #9982 (comment) by @bghira, I think we could support this directly in the enable_gradient_checkpointing() method we expose for the models.

Users could specify the block interval they want gradient checkpointing to be applied in and we take care of the rest. The code for this is simple and doesn't require any hacks.

Gradient checkpointing is a crucial component for training/fine-tuning larger models and this technique allows for nice speed/memory trade-off.

Cc: @a-r-r-o-w @hlky

The text was updated successfully, but these errors were encountered:

a-r-r-o-w · 2025-01-02T13:36:27Z

Taking this up soon. Will prio for next release

bghira · 2025-01-02T13:44:39Z

i've implemented this in simpletuner's modeling code for SDXL, Flux, SD3, and Sana, and even the smaller models benefit (maybe moreso) from having this option. this means that basically Sana goes from 4 seconds per step to 1.25 steps per second on a 24G card (int8) when training. I actually haven't checked the speedup for Flux but I assume on 80G card it'll be quite substantial.

github-actions · 2025-01-26T15:03:20Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul added training performance Anything related to performance improvements, profiling and benchmarking labels Dec 5, 2024

a-r-r-o-w self-assigned this Jan 2, 2025

a-r-r-o-w mentioned this issue Jan 20, 2025

Refactor gradient checkpointing #10611

Merged

github-actions bot added the stale Issues that haven't received updates label Jan 26, 2025

a-r-r-o-w closed this as completed in #10611 Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[models] allow specifying block numbers in `enable_gradient_checkpointing()` to speed up training #10124

[models] allow specifying block numbers in `enable_gradient_checkpointing()` to speed up training #10124

sayakpaul commented Dec 5, 2024

a-r-r-o-w commented Jan 2, 2025

bghira commented Jan 2, 2025

github-actions bot commented Jan 26, 2025

[models] allow specifying block numbers in enable_gradient_checkpointing() to speed up training #10124

[models] allow specifying block numbers in enable_gradient_checkpointing() to speed up training #10124

Comments

sayakpaul commented Dec 5, 2024

a-r-r-o-w commented Jan 2, 2025

bghira commented Jan 2, 2025

github-actions bot commented Jan 26, 2025

[models] allow specifying block numbers in `enable_gradient_checkpointing()` to speed up training #10124

[models] allow specifying block numbers in `enable_gradient_checkpointing()` to speed up training #10124