-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Custom Diffusion: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #6879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It could be a dataloading problem, because I don't see this error when using the example from the README. I would suggest debugging the dataloader. |
Hi again, I run the code with the 'cat' example from the README file and it works fine. I also adjusted to 'person' and it works. The problem comes when you activate the Can you try to reproduce it at yours? I am almost certain bug is related to '--freeze_model crossattn ', Currently, I run below code with no problem...
|
when training above, I encountered a issue #4704 where I added when running this
the model finishes training and breaks only when loading the pipeline components
|
The stack trace isn't informative enough. There is nothing there suggesting it's coming from the |
@rezkanas @sayakpaul Hi guys, did u address this error? I have the same issue |
I resolve it as per #6879 (comment) |
Will try to reproduce. |
I would still appreciate if I can train the model with the attribute |
Hi. I just tried with the commands provided in the README of the example and I didn't run into any problems, whatsoever. I am on PyTorch 2.2, |
I have also Pytorch 2.2 and installed |
Hi, I encountered a similar problem. I tried Reproduction:
Logs:
System Info:
What I tried:
|
Cc: @nupurkmr9 |
@rezkanas @shinnosukeono, sorry for the delayed response. I believe the error might be because of float16 training. Without it I am able to train with |
@nupurkmr9 Thank you so much for your reply! I changed |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
gentle pin @nupurkmr9 |
Describe the bug
when running custom diffusion on my 20 photos repositories ... I run into this error that is related to data type difference..
Reproduction
!accelerate launch train_custom_diffusion.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--class_data_dir=$class_data_dir
--with_prior_preservation
--prior_loss_weight=1.0
--class_prompt="person"
--num_class_images=200
--instance_prompt="photo of a person"
--resolution=512
--train_batch_size=2
--learning_rate=5e-6
--lr_warmup_steps=0
--max_train_steps=1200
--freeze_model=crossattn
--scale_lr
--hflip
--use_8bit_adam
--gradient_checkpointing
--enable_xformers_memory_efficient_attention
--modifier_token ""
--validation_prompt=" person sitting in a bucket"
Logs
System Info
diffusers
version: 0.26.1Who can help?
@sayakpaul @patrickvonplaten
The text was updated successfully, but these errors were encountered: