train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

jloveric · 2024-03-13T21:12:31Z

Describe the bug

I'm trying to run the train_text_to_image_lora_sdxl.py script and the output in tensorboard is black images. Now I assume I should be able to resolve this by reducing learning rate etc (reducing to 1e-6 has not resolved the issue)... but it I would expect the default values would be generally produce a result. Using the same dataset with the train_text_to_image_lora.py works fine. The command I'm running is just

accelerate launch train_text_to_image_lora_sdxl.py   --mixed_precision=fp16 --pretrained_model_name_or_path=$MODEL_NAME   --train_data_dir=instance-imgs   --output_dir=outputxl   --report_to=tensorboard   --checkpointing_steps=500   --validation_prompt="a prompt"   --seed=42 --train_batch_size=1

Reproduction

export MODEL_NAME=stabilityai/stable-diffusion-xl-base-1.0

Logs

03/13/2024 14:00:49 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dropout', 'reverse_transformer_layers_per_block', 'attention_type'} was not found in config. Values will be initialized to default values.
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107/107 [00:00<00:00, 296035.97it/s]
03/13/2024 14:01:01 - INFO - __main__ - ***** Running training *****
03/13/2024 14:01:01 - INFO - __main__ -   Num examples = 106
03/13/2024 14:01:01 - INFO - __main__ -   Num Epochs = 100
03/13/2024 14:01:01 - INFO - __main__ -   Instantaneous batch size per device = 1
03/13/2024 14:01:01 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
03/13/2024 14:01:01 - INFO - __main__ -   Gradient Accumulation steps = 1
03/13/2024 14:01:01 - INFO - __main__ -   Total optimization steps = 10600
Steps:   1%|█                                                                                                     | 106/10600 [01:46<2:50:32,  1.03it/s, lr=0.0001, step_loss=0.13]03/13/2024 14:02:47 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: a prompt.
{'feature_extractor', 'image_encoder'} was not found in config. Values will be initialized to default values.
                                                                                                                                                                                  Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.                                                 | 0/7 [00:00<?, ?it/s]
{'sigma_min', 'timestep_type', 'rescale_betas_zero_snr', 'sigma_max'} was not found in config. Values will be initialized to default values.
Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 50.97it/s]
/home/john/.cache/pypoetry/virtualenvs/diffusers-MD3YhPSL-py3.11/lib/python3.11/site-packages/diffusers/image_processor.py:92: RuntimeWarning: invalid value encountered in cast/s]
  images = (images * 255).round().astype("uint8")

System Info

Ubuntu, 4090

Who can help?

No response

The text was updated successfully, but these errors were encountered:

jloveric · 2024-03-14T14:09:55Z

Problem resolved by a comment in #6815. Basically don't use the default VAE, instead use one with the fp16 fix as described in the documentation https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/README_sdxl.md

thenx · 2024-10-06T15:38:58Z

In case someone ends up here since this issue is in the top of google's SERP...

Had similar issue with some models (flux/cogvideo) after switching to 4090:

RuntimeWarning: invalid value encountered in cast
    images = (images * 255).round().astype("uint8")

Turned out that I had nvidia-driver-545 metapackage active. Switching to nvidia-driver-550 (which is the latest stable ATM) fixed the issue.

jloveric added the bug Something isn't working label Mar 13, 2024

jloveric closed this as completed Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

jloveric commented Mar 13, 2024 •

edited

Loading

jloveric commented Mar 14, 2024

thenx commented Oct 6, 2024 •

edited

Loading

train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

Comments

jloveric commented Mar 13, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

jloveric commented Mar 14, 2024

thenx commented Oct 6, 2024 • edited Loading

jloveric commented Mar 13, 2024 •

edited

Loading

thenx commented Oct 6, 2024 •

edited

Loading