-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
hi @patil-suraj @sayakpaul, was wondering if this is something interesting for you to look into ? Feedback is appreciated |
cool! |
I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model. |
|
||
if args.push_to_hub: | ||
repo_id = create_repo( | ||
repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to follow:
if args.report_to == "wandb" and args.hub_token is not None: |
Otherwise, hub_token
will be compromised on wandb
run page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment wasn't addressed?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments. Looking quite nice.
I do think having an option to enable LAMA like making might be a very good reference point as our training scripts are quite widely referenced.
And I apologize for the delay.
I thought having the most simple implementation would do. And then the user can decide which masking strategy to use actually. Sure will add that, if that's a deal breaker |
@sayakpaul I have adapted masking strategy from LAMA paper on my local branch. I have a question, is it according to guidelines to have a config file properties for the masking separately, like here: I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific. |
You are absolutely correct. What we can do is include a note about the masking strategy in the README and link to your implementation. Does that sound good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really nice now. I will let @patil-suraj review this too.
prompt = batch["prompts"][0] | ||
|
||
with torch.autocast("cuda"): | ||
#### UPDATE PIPELINE HERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this command need to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"#### UPDATE PIPELINE HERE"
|
||
if args.push_to_hub: | ||
repo_id = create_repo( | ||
repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment wasn't addressed?
init_image = image_transform(batch["pixel_values"][0]) | ||
prompt = batch["prompts"][0] | ||
|
||
with torch.autocast("cuda"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make use of the log_validation()
function here and log the results to wandb
as well. You can refer to https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py for implementing this. But let me know if you need some more clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I think we also need to add a test case here. |
![]() https://github.com/cryptexis/diffusers/blob/sd_15_inpainting/examples/inpainting/train_inpainting.py#L771 - in my repo I do not have anything similar to it under those lines. And the piece of code you're referring to is here. |
I see a lot of |
Examples Training with Random MaskingInference with Square Mask (as before)Prompt: a drawing of a green pokemon with red eyespre-trained stable-diffusion-inpaintingfine-tuned stable-diffusion-inpaintingpre-trained stable-diffusion-v1-5fine-tuned stable-diffusion-v1-5 (no inpainting)fine-tuned stable-diffusion-v1-5 (inpainting)Inference with Random Maskpre-trained stable-diffusion-inpaintingfine-tuned stable-diffusion-inpaintingpre-trained stable-diffusion-v1-5fine-tuned stable-diffusion-v1-5 (no inpainting)fine-tuned stable-diffusion-v1-5 (inpainting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I think the only that is pending now is the testing suite.
@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall: Was wondering if it is a systematic issue across all tests.... |
Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT? BTW, for fixing the code quality issues, we need to run |
Done @sayakpaul , I think everything is addressed, tests are pushed. Thanks a lot for the patience, support and all the help! |
How to prepare dataset? image |
@cryptexis let's fix the example tests that are failing now. |
can anyone share script of sdxl inpainting fine tuning? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this, the script looks great! Just left some nits.
For the runwayml inpainting model, during training they mask the whole image 25% of the time. Have you experimented with that ?
ftfy | ||
tensorboard | ||
Jinja2 | ||
peft==0.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need peft for this example ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
When is this getting merged? |
Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Suraj Patil <[email protected]>
@cryptexis
will merge once the tests pass! |
@Sanster Thanks for your plan, I also want to finetune an stable difffusion inpainting model for object removal. Have you tried this, how is the performance? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi patil-suraj @patil-suraj , appreciated for the convenient script ! Is there any code example and dataset example to run the script: https://github.com/huggingface/diffusers/blob/inpainting-script/examples/inpainting/train_inpainting_sdxl.py ? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This functionality allows training/fine-tuning of the 9 channel inpainting models provided by
This is due to noticing that many inpainting models provided to the community e.g. on https://civitai.com/ have
unets
with 4 input channels. 4 channel models may lack capacity and eventually quality in the inpainting tasks. To support the community to develop fully fledged inpainting models I have modified the text_to_image training pipeline to do inpainting.Additions:
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul and @patrickvonplaten
Examples Out of Training Distribution Scenery:
Prompt: a drawing of a green pokemon with red eyes
Pre-trained
Fine-tuned
Prompt: a green and yellow toy with a red nose
Pre-trained
Fine-tuned
Prompt: a red and white ball with an angry look on its face
Pre-trained
Fine-tuned