-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting CUDA out of memory error even with Colab A100 high RAM #11014
Comments
I think the model is just too big for fine tuning and the attempts on the parameters to make it fit on the A100 gpu is too tight. It would be better just to use a smaller model. |
@Aravind-11 It's possible to fully finetune flux under 24 GB, and do lora finetuning in under 8 GB or lower with clever offloading and other techniques. @gurselnaziroglu Could you try with |
Thanks for the suggestion. I tried it, but it didn't work. I'm still getting the same error in the same position. I would appreciate any further suggestions. |
Got the same issues on "torch.OutOfMemoryError: CUDA out of memory" when loading the flux_transformer into the 32GB GPU. |
Hi, I tried with your command but with the dog dataset which are a few images only, with what you're using it requires this:
So first, there's no way to load it like this in a 32GB or a 40GB VRAM GPU, you will always get OOM, also with validation it will use even more, I tested it with a L40S which has 45GB and it barely fits in it without validation. Also this is not a BUG, it clearly says in the README that you will need more than a 40GB VRAM GPU, so if you want to train Flux like this, you have these options:
Probably your best solution here is to use a library that's just specially made for training which probably has more options to lower the VRAM, if you try to do it here, you will need to do some work (coding) to get it to run on something that is not an A100, H100 or superior GPUs. |
Describe the bug
I am trying to fine-tune Flux.1-dev with Lora on the Google Colab A100 runtime environment. It has 80 GB system RAM and 40 GB VRAM. I followed the recommended steps from this link. I am still getting "CUDA out of memory" error. I saw that related old issues were closed, but the bug seems to be still present.
Reproduction
Here is the last version that I tried. It is with AdamQ. I also tried using Prodigy optimizer and got the same error.
!accelerate launch -q train_dreambooth_lora_flux.py
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev"
--instance_data_dir="train_photos"
--output_dir="trained-flux-lora"
--mixed_precision="bf16"
--instance_prompt="a photo of X"
--resolution=512
--rank=1
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="AdamW"
--learning_rate=1.
--lr_scheduler="constant"
--lr_warmup_steps=0
--max_train_steps=500
--validation_prompt="A photo of X biking"
--validation_epochs=25
--seed="0"
--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"
--use_8bit_adam
Logs
System Info
Google Colab A100 runtime environment
Who can help?
No response
The text was updated successfully, but these errors were encountered: