Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_text_to_image_lora_sdxl.py does not seem to work using defaults #7310

Closed
jloveric opened this issue Mar 13, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@jloveric
Copy link

jloveric commented Mar 13, 2024

Describe the bug

I'm trying to run the train_text_to_image_lora_sdxl.py script and the output in tensorboard is black images. Now I assume I should be able to resolve this by reducing learning rate etc (reducing to 1e-6 has not resolved the issue)... but it I would expect the default values would be generally produce a result. Using the same dataset with the train_text_to_image_lora.py works fine. The command I'm running is just

accelerate launch train_text_to_image_lora_sdxl.py   --mixed_precision=fp16 --pretrained_model_name_or_path=$MODEL_NAME   --train_data_dir=instance-imgs   --output_dir=outputxl   --report_to=tensorboard   --checkpointing_steps=500   --validation_prompt="a prompt"   --seed=42 --train_batch_size=1

Reproduction

export MODEL_NAME=stabilityai/stable-diffusion-xl-base-1.0

Logs

03/13/2024 14:00:49 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dropout', 'reverse_transformer_layers_per_block', 'attention_type'} was not found in config. Values will be initialized to default values.
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107/107 [00:00<00:00, 296035.97it/s]
03/13/2024 14:01:01 - INFO - __main__ - ***** Running training *****
03/13/2024 14:01:01 - INFO - __main__ -   Num examples = 106
03/13/2024 14:01:01 - INFO - __main__ -   Num Epochs = 100
03/13/2024 14:01:01 - INFO - __main__ -   Instantaneous batch size per device = 1
03/13/2024 14:01:01 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
03/13/2024 14:01:01 - INFO - __main__ -   Gradient Accumulation steps = 1
03/13/2024 14:01:01 - INFO - __main__ -   Total optimization steps = 10600
Steps:   1%|| 106/10600 [01:46<2:50:32,  1.03it/s, lr=0.0001, step_loss=0.13]03/13/2024 14:02:47 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: a prompt.
{'feature_extractor', 'image_encoder'} was not found in config. Values will be initialized to default values.
                                                                                                                                                                                  Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.                                                 | 0/7 [00:00<?, ?it/s]
{'sigma_min', 'timestep_type', 'rescale_betas_zero_snr', 'sigma_max'} was not found in config. Values will be initialized to default values.
Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 50.97it/s]
/home/john/.cache/pypoetry/virtualenvs/diffusers-MD3YhPSL-py3.11/lib/python3.11/site-packages/diffusers/image_processor.py:92: RuntimeWarning: invalid value encountered in cast/s]
  images = (images * 255).round().astype("uint8")

System Info

Ubuntu, 4090

Who can help?

No response

@jloveric jloveric added the bug Something isn't working label Mar 13, 2024
@jloveric
Copy link
Author

Problem resolved by a comment in #6815. Basically don't use the default VAE, instead use one with the fp16 fix as described in the documentation https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/README_sdxl.md

@thenx
Copy link

thenx commented Oct 6, 2024

In case someone ends up here since this issue is in the top of google's SERP...

Had similar issue with some models (flux/cogvideo) after switching to 4090:

RuntimeWarning: invalid value encountered in cast
    images = (images * 255).round().astype("uint8")

Turned out that I had nvidia-driver-545 metapackage active. Switching to nvidia-driver-550 (which is the latest stable ATM) fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants