-
Notifications
You must be signed in to change notification settings - Fork 6k
Training memory optimizations not working on AMD hardware #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the same error on google colab, seems like Seems like checkpointing was only recently implemented in the codebase and I assume wasn't released yet |
Method was added in this commit (10 days ago) e7120ba And latest release was 23 days ago. Installing from main should solve the issue.
Same issue and solution were already explained in #656, will leave my explanation just in case. |
The |
@hopibel thanks for clarification, updated my original comment |
@Gonzih thanks! That does indeed solve the issue with gradient checkpointing and actually gets really close to being able to train with AMD gpus. Dreambooth example actually starts this time, even though it still runs out of memory during the first iteration on 512x512, with this optimization current maximum without running out of memory with the example command is 384x384. That said, there seems to be still some potential to optimize things further on AMD hardware. Is the idea behind Any other alternatives that could be tested for lowering memory usage on Radeon cards? |
408x408 with https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth using |
Probably the latter, assuming bitsandbytes isn't doing anything weird. In theory one might be able to just compile a rocm version with hipcc, but the python code has some hardcoded checks for cuda libraries so it probably isn't that straightforward |
There's quite recent issue on |
We will release a new |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This still doesn't seem to have been resolved. |
@patil-suraj could you take a look here? |
I'm not really familiar with AMD GPUs, maybe @NouamaneTazi has some ideas here :) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I'm throwing a comment to try to keep the issue active, I'll eventually be able to try testing it again just in case it did get fixed |
Think we currently don't really have access to AMD hardware for testing, but we'd indeed by quite interested in getting AMD to work. Any pointers how to best proceed here? Maybe also cc @anton-l @mfuntowicz |
Last night, I came across this fork of bitsandbytes called bitsandbytes-rocm, and I can confirm that it does in-fact work on AMD hardware and allows me to at least utilize dreambooth, I have not tested any of the other projects in this repo. With an RX 6900XT, I successfully ran dreambooth at 13GB VRAM utilization without xformers. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Describe the bug
Dreambooth training example has a section about training on 16GB GPU. As Radeon Navi 21 series models all have 16GB available this in theory would increase the amount of hardware that can train models by a really large margin.
Problem is that at least out of the box neither of the optimizations
--gradient_checkpointing
nor--use_8bit_adam
seem to support AMD cards.Reproduction
Using the example command command with pytorch rocm 5.1.1 (
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
)--gradient_checkpointing
: returns error'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing'
--use_8bit_adam
: throws handful of CUDA errors, see Logs section below for the main part (isbitsandbytes
Nvidia specific and if it is is there an AMD implementation available?)Logs
using
--gradient_checkpointing
:--use_8bit_adam
:The text was updated successfully, but these errors were encountered: