Skip to content

Commit 68441bf

Browse files
williambermandg845
authored andcommitted
Add IF dreambooth docs (huggingface#3470)
1 parent 368f9ad commit 68441bf

File tree

1 file changed

+64
-0
lines changed

1 file changed

+64
-0
lines changed

examples/dreambooth/README.md

+64
Original file line numberDiff line numberDiff line change
@@ -531,3 +531,67 @@ More info: https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_
531531

532532
### Experimental results
533533
You can refer to [this blog post](https://huggingface.co/blog/dreambooth) that discusses some of DreamBooth experiments in detail. Specifically, it recommends a set of DreamBooth-specific tips and tricks that we have found to work well for a variety of subjects.
534+
535+
## IF
536+
537+
You can use the lora and full dreambooth scripts to also train the text to image [IF model](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0). A few alternative cli flags are needed due to the model size, the expected input resolution, and the text encoder conventions.
538+
539+
### LoRA Dreambooth
540+
This training configuration requires ~28 GB VRAM.
541+
542+
```sh
543+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
544+
export INSTANCE_DIR="dog"
545+
export OUTPUT_DIR="dreambooth_dog_lora"
546+
547+
accelerate launch train_dreambooth_lora.py \
548+
--report_to wandb \
549+
--pretrained_model_name_or_path=$MODEL_NAME \
550+
--instance_data_dir=$INSTANCE_DIR \
551+
--output_dir=$OUTPUT_DIR \
552+
--instance_prompt="a sks dog" \
553+
--resolution=64 \ # The input resolution of the IF unet is 64x64
554+
--train_batch_size=4 \
555+
--gradient_accumulation_steps=1 \
556+
--learning_rate=5e-6 \
557+
--scale_lr \
558+
--max_train_steps=1200 \
559+
--validation_prompt="a sks dog" \
560+
--validation_epochs=25 \
561+
--checkpointing_steps=100 \
562+
--pre_compute_text_embeddings \ # Pre compute text embeddings to that T5 doesn't have to be kept in memory
563+
--tokenizer_max_length=77 \ # IF expects an override of the max token length
564+
--text_encoder_use_attention_mask # IF expects attention mask for text embeddings
565+
```
566+
567+
### Full Dreambooth
568+
Due to the size of the optimizer states, we recommend training the full XL IF model with 8bit adam.
569+
Using 8bit adam and the rest of the following config, the model can be trained in ~48 GB VRAM.
570+
571+
For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade.
572+
573+
```sh
574+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
575+
576+
export INSTANCE_DIR="dog"
577+
export OUTPUT_DIR="dreambooth_if"
578+
579+
accelerate launch train_dreambooth.py \
580+
--pretrained_model_name_or_path=$MODEL_NAME \
581+
--instance_data_dir=$INSTANCE_DIR \
582+
--output_dir=$OUTPUT_DIR \
583+
--instance_prompt="a photo of sks dog" \
584+
--resolution=64 \ # The input resolution of the IF unet is 64x64
585+
--train_batch_size=4 \
586+
--gradient_accumulation_steps=1 \
587+
--learning_rate=1e-7 \
588+
--max_train_steps=150 \
589+
--validation_prompt "a photo of sks dog" \
590+
--validation_steps 25 \
591+
--text_encoder_use_attention_mask \ # IF expects attention mask for text embeddings
592+
--tokenizer_max_length 77 \ # IF expects an override of the max token length
593+
--pre_compute_text_embeddings \ # Pre compute text embeddings to that T5 doesn't have to be kept in memory
594+
--use_8bit_adam \ #
595+
--set_grads_to_none \
596+
--skip_save_text_encoder # do not save the full T5 text encoder with the model
597+
```

0 commit comments

Comments
 (0)