Skip to content

Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

cryptexis
Copy link

@cryptexis cryptexis commented Feb 9, 2024

What does this PR do?

This functionality allows training/fine-tuning of the 9 channel inpainting models provided by

This is due to noticing that many inpainting models provided to the community e.g. on https://civitai.com/ have unets with 4 input channels. 4 channel models may lack capacity and eventually quality in the inpainting tasks. To support the community to develop fully fledged inpainting models I have modified the text_to_image training pipeline to do inpainting.

Additions:

  • Added random masking strategy (squares) during the training, center crop during validation
  • Take first 3 images of the pokemon dataset as validation set

Before submitting

Who can review?

@sayakpaul and @patrickvonplaten

Examples Out of Training Distribution Scenery:

Prompt: a drawing of a green pokemon with red eyes

Pre-trained

pretrained_0

Fine-tuned

finetuned_0

Prompt: a green and yellow toy with a red nose

Pre-trained

pretrained_1

Fine-tuned

finetuned_1

Prompt: a red and white ball with an angry look on its face

Pre-trained

pretrained_2

Fine-tuned

finetuned_2

@sayakpaul sayakpaul requested a review from patil-suraj February 9, 2024 13:30
@cryptexis
Copy link
Author

hi @patil-suraj @sayakpaul, was wondering if this is something interesting for you to look into ? Feedback is appreciated

@yiyixuxu
Copy link
Collaborator

cool!
gentle pin @patil-suraj

@drhead
Copy link
Contributor

drhead commented Feb 19, 2024

I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.


if args.push_to_hub:
repo_id = create_repo(
repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to follow:

if args.report_to == "wandb" and args.hub_token is not None:

Otherwise, hub_token will be compromised on wandb run page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment wasn't addressed?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments. Looking quite nice.

I do think having an option to enable LAMA like making might be a very good reference point as our training scripts are quite widely referenced.

And I apologize for the delay.

@cryptexis
Copy link
Author

I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.

@sayakpaul

I thought having the most simple implementation would do. And then the user can decide which masking strategy to use actually. Sure will add that, if that's a deal breaker

@cryptexis
Copy link
Author

@sayakpaul I have adapted masking strategy from LAMA paper on my local branch. I have a question, is it according to guidelines to have a config file properties for the masking separately, like here:
https://github.com/advimman/lama/blob/main/configs/training/data/abl-04-256-mh-dist-celeba.yaml#L10 ?

I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.

@sayakpaul
Copy link
Member

I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.

You are absolutely correct. What we can do is include a note about the masking strategy in the README and link to your implementation. Does that sound good?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really nice now. I will let @patil-suraj review this too.

prompt = batch["prompts"][0]

with torch.autocast("cuda"):
#### UPDATE PIPELINE HERE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this command need to be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"#### UPDATE PIPELINE HERE"


if args.push_to_hub:
repo_id = create_repo(
repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment wasn't addressed?

init_image = image_transform(batch["pixel_values"][0])
prompt = batch["prompts"][0]

with torch.autocast("cuda"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make use of the log_validation() function here and log the results to wandb as well. You can refer to https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py for implementing this. But let me know if you need some more clarifications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sayakpaul
Copy link
Member

I think we also need to add a test case here.

@cryptexis
Copy link
Author

Screenshot 2024-03-02 at 13 20 53 @sayakpaul I think it's a github glitch. :) to the extent that I cannot reply you there.

https://github.com/cryptexis/diffusers/blob/sd_15_inpainting/examples/inpainting/train_inpainting.py#L771 - in my repo I do not have anything similar to it under those lines. And the piece of code you're referring to is here.

@cryptexis
Copy link
Author

cryptexis commented Mar 2, 2024

I think we also need to add a test case here.

I see a lot of https://huggingface.co/hf-internal-testing is used in the testing. Are usual mortals able to add unit tests ?

@cryptexis
Copy link
Author

Examples Training with Random Masking

Inference with Square Mask (as before)

Prompt: a drawing of a green pokemon with red eyes

pre-trained stable-diffusion-inpainting

pretrained_inpainting_0

fine-tuned stable-diffusion-inpainting

finetuned_inpainting_0

pre-trained stable-diffusion-v1-5

pretrained_text2img_0

fine-tuned stable-diffusion-v1-5 (no inpainting)

finetuned_text2img_0

fine-tuned stable-diffusion-v1-5 (inpainting)

finetuned_text2img_to_inpainting_0

Inference with Random Mask

pre-trained stable-diffusion-inpainting

pretrained_inpainting_2

fine-tuned stable-diffusion-inpainting

finetuned_inpainting_2

pre-trained stable-diffusion-v1-5

pretrained_text2img_2

fine-tuned stable-diffusion-v1-5 (no inpainting)

finetuned_text2img_2

fine-tuned stable-diffusion-v1-5 (inpainting)

finetuned_text2img_to_inpainting_2

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I think the only that is pending now is the testing suite.

@cryptexis
Copy link
Author

Looking good. I think the only that is pending now is the testing suite.

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

attaching the screenshot:
Screenshot 2024-03-03 at 06 56 07

Was wondering if it is a systematic issue across all tests....

@sayakpaul
Copy link
Member

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?

BTW, for fixing the code quality issues, we need to run make style && make quality from the root of diffusers.

@cryptexis
Copy link
Author

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?

BTW, for fixing the code quality issues, we need to run make style && make quality from the root of diffusers.

Done @sayakpaul , I think everything is addressed, tests are pushed. Thanks a lot for the patience, support and all the help!

@crapthings
Copy link

How to prepare dataset?

image
mask
prompt

@sayakpaul
Copy link
Member

@cryptexis let's fix the example tests that are failing now.

@Srinivasa-N707
Copy link

can anyone share script of sdxl inpainting fine tuning?

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this, the script looks great! Just left some nits.

For the runwayml inpainting model, during training they mask the whole image 25% of the time. Have you experimented with that ?

ftfy
tensorboard
Jinja2
peft==0.7.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need peft for this example ?

Copy link
Contributor

github-actions bot commented Apr 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Apr 4, 2024
@cs-mshah
Copy link

When is this getting merged?

@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Apr 11, 2024
@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Apr 11, 2024

@cryptexis
can you

  1. address the final comments here Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922 (comment) - if peft is not used we can remove it; otherwise we are all good
  2. make sure the tests pass

will merge once the tests pass!

@zijinY
Copy link

zijinY commented May 2, 2024

@Sanster Thanks for your plan, I also want to finetune an stable difffusion inpainting model for object removal. Have you tried this, how is the performance?

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024
@fire2323
Copy link

Hi patil-suraj @patil-suraj , appreciated for the convenient script ! Is there any code example and dataset example to run the script: https://github.com/huggingface/diffusers/blob/inpainting-script/examples/inpainting/train_inpainting_sdxl.py ?

@github-actions github-actions bot removed the stale Issues that haven't received updates label Feb 11, 2025
Copy link
Contributor

github-actions bot commented Mar 8, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates training
Projects
None yet
Development

Successfully merging this pull request may close these issues.