Skip to content

Add support for Multi-ControlNet to StableDiffusionControlNetPipeline #2627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 13, 2023

Conversation

takuma104
Copy link
Contributor

@takuma104 takuma104 commented Mar 9, 2023

Discussed in #2556. This PR makes StableDiffusionControlNetPipeline compatible with multiple ControlNets (Multi-ControlNet). Currently, one ControlNet conditions one UNet, but it will extend this to condition multiple ControlNets on one UNet, enabling more advanced image generation control. For example, as shown in the code example, you can use canny and openpose simultaneously.

Modification points:

  • Made it possible to specify the ControlNetModel array to the controlnet argument on init().
  • Created a new ControlNetCondition class and made it so that one is specified for each ControlNet processing. Conditional image preprocessing was also moved to this class.
  • Move down/mid output scaling to ControlNetModel

Usage Example:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
             image=[pose_image, canny_image],
        ).images[0]
image.save("output.png")

Generate Example:

Control Image1 Control Image2 Generated
(none)
(none)

TODO:

  • Support for multiple ControlNet condition image directly as an array to the image argument
  • Fix Docstring
  • Fix Fast-test

@patrickvonplaten @williamberman

@takuma104
Copy link
Contributor Author

@patrickvonplaten @williamberman
I have several questions and concerns at this point. Any advice would be appreciated.

  • For code simplification, I made it so that MultiControlNet is created even if only one ControlNet is specified. Will this cause any backward compatibility issues?
  • The image argument in __call__() already supports PIL.Image or Tensor arrays to accommodate multiple prompts or batches. Combining this with multiple ControlNets is causing me some confusion in terms of mapping. I don't have an immediate solution for this. I will continue to think about it.
  • The following fast tests are failing in my environment due to the onnxruntime-training module not being found during save_pretrained(). This issue only occurred after merging the MultiControlNet module. I haven't been able to install onnxruntime-training in my environment easily, so I have postponed this for now. I will check the results of the CI test later.
StableDiffusionControlNetPipelineFastTests::test_save_load_float16
StableDiffusionControlNetPipelineFastTests::test_save_load_local
StableDiffusionControlNetPipelineFastTests::test_save_load_optional_components

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 9, 2023

The documentation is not available anymore as the PR was closed or merged.

@takuma104 takuma104 changed the title Add support Multi-Controlnet to StableDiffusionControlNetPipeline Add support Multi-ControlNet to StableDiffusionControlNetPipeline Mar 9, 2023
@takuma104 takuma104 changed the title Add support Multi-ControlNet to StableDiffusionControlNetPipeline Add support for Multi-ControlNet to StableDiffusionControlNetPipeline Mar 9, 2023
@HimariO
Copy link
Contributor

HimariO commented Mar 9, 2023

@takuma104 Here is the implementation of the MultiAdapter, which is essentially the same as MultiControlNet by my understanding. I think my implementation can avoid the save_pretrained() issue you mentioned, and it also allows the user to load the entire pipeline that contains MultiAdapter/MultiControlNet with from_pretrained()

@takuma104
Copy link
Contributor Author

@HimariO Great! Thanks for letting me know! Indeed, if I modify the MultiAdapter for ControlNet, it could be used as MultiControlNet. Is this code part of PR #2555? Once it gets merged, we can use the Sideload mechanism as well. I'll wait and see how the review process for #2555 goes. By the way, congratulations on getting out of Draft!

return_dict=False,
)

down_block_res_samples = [
Copy link
Contributor

@patrickvonplaten patrickvonplaten Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the clean-up here, but I think since we have to pay attention to this: https://github.com/huggingface/diffusers/pull/2627/files#r1132300008 we should maybe put all this directly in the ControlNetModel forward call? Also see: https://github.com/huggingface/diffusers/pull/2627/files#r1132302783

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea! I'll write it in that direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d1acef4

)

# scaling
down_samples = [sample * cond.scale for sample in down_samples]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be removed if we put the scale directly in the original ConditionUnet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d1acef4

@@ -470,67 +659,6 @@ def check_inputs(
f" {negative_prompt_embeds.shape}."
)

image_is_pil = isinstance(image, PIL.Image.Image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor looks good to me

@@ -143,6 +145,220 @@ def test_inference_batch_single_identical(self):
self._test_inference_batch_single_identical(expected_max_diff=2e-3)


class StableDiffusionMultiControlNetPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice tests!

# override PipelineTesterMixin
@unittest.skip(reason="Not implemented")
def test_save_load_optional_components(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe add one slow tests that verifies that using two controlnets on a real eaxmple (SDv1-5) works well? Think the test can look similar to this one but just with two controlnets:

assert np.abs(expected_image - image).max() < 5e-3

Copy link
Contributor Author

@takuma104 takuma104 Mar 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in this commit. I saved the results with np.save and temporarily placed them in my Huggingface-hub. I hope you can copy the .npy file to the appropriate location.

By the way, it seems that all the other slow tests are failing on my environment (Ubuntu, RTX3090). This seems to be the case even on the latest version of the main branch, so this modification is unlikely to be related. Short test summary is following:

================================================= short test summary info =================================================
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_canny - AssertionError: assert 0.01935801 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_depth - AssertionError: assert 0.42120677 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_hed - AssertionError: assert 0.054451108 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_mlsd - AssertionError: assert 0.03511852 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_normal - AssertionError: assert 0.011472225 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_openpose - AssertionError: assert 0.024894118 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_scribble - AssertionError: assert 0.5503108 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_seg - AssertionError: assert 0.15960121 < 0.005
============================ 8 failed, 26 passed, 5 skipped, 13 warnings in 239.19s (0:03:59) =============================

@@ -594,6 +700,7 @@ def __call__(
callback_steps: int = 1,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
controlnet_conditioning_scale: float = 1.0,
controlnet_conditions: Optional[List[ControlNetCondition]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
controlnet_conditions: Optional[List[ControlNetCondition]] = None,

Would it be ok for you to just use the controlnet_conditions for internal usage of the pipeline?

I'm a bit worried about exposing a very new design module such as ControlNetCondition to the main __init__.py and also think the user shouldn't have to learn about a new concept when using multi controlnet, this just passing multiple images and controlnet_conditioning_scale should be good enough :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. The benefits of ControlNetCondition may become apparent as more parameter extensions are added, but for now I'll keep it for internal use only and not expose it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d1acef4

@@ -671,13 +778,20 @@ def __call__(
list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work"
(nsfw) content, according to the `safety_checker`.
"""

# TODO: add conversion image array to ControlNetConditions
if controlnet_conditions is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if controlnet_conditions is None:

Let's maybe always convert the image to controlnet conditionings to check internally, but wouldn't expose to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d1acef4

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @takuma104,

This PR already looks to be in a great state!

Two things I think we need to handle:

1.) For single controlnet use cases, I don't think we can change the class that is saved to MultiControlNet - see comment here: . This would disable a use case that is sometimes needed IMO
2.) I'd maybe suggest to not introduce a new concept (ControlNetCondition) to the user and intstead just use it for internal handling. E.g. I'd only allow the following use case:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
            image=[pose_image, canny_image],
            controlnet_conditioning_scale=[1, 1.2],
        ).images

I think this is a bit easier to understand for the user and also keeps our public API a bit leaner.

3.) I think we can pass the conditioning scale logic actually directly into the forward pass of ControlNetModel as this would simplify some code and also makes sense IMO.

Would this be ok for you? Does that make sense?

takuma104 and others added 3 commits March 11, 2023 00:10
@takuma104
Copy link
Contributor Author

After making various corrections, ControlNetCondition became unnecessary, so I delete it altogether. I think the code diff has been considerably reduced. All that's left is to fix docstrings and tests, I guess.

@takuma104 takuma104 marked this pull request as ready for review March 12, 2023 11:35
@@ -492,6 +493,10 @@ def forward(

mid_block_res_sample = self.controlnet_mid_block(sample)

# 6. scaling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

return controlnet_conditioning_scale

# override DiffusionPipeline
def save_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very clean - great!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made sure that all the slow tests work - just updated the precision a bit :-)

Amazing work here @takuma104! From my side this is all good to merge. @williamberman could you maybe take a quick look as well ?

@takuma104, also if you'd like you could maybe create a quick space under your namespace: https://huggingface.co/takuma104 that shows how multi-controlnet works in action. It should be really simple to set up the space:

@takuma104
Copy link
Contributor Author

@patrickvonplaten
Thanks! I created a space easily with copy and paste (it's easier than I thought).
https://huggingface.co/spaces/takuma104/multi-controlnet

I think it's a bit difficult to automatically generate the combination of pose and canny, so we will ask the user to prepare them. If users can refer to examples, it should be sufficient as a demo.

The prompt is a little vague with only "best quality, extremely detailed," so I am providing some guidance. I'm still not entirely satisfied with the output of the demo, so I'll try creating some control images that will likely yield better results with SD1.5 tomorrow.

MultiControlNet -> MultiControlNetModel - Matches existing naming a bit
closer

MultiControlNetModel inherit from model utils class - Don't have to
re-write fp16 test

Skip tests that save multi controlnet pipeline - Clearer than changing
test body

Don't auto-batch the number of input images to the number of controlnets.
We generally like to require the user to pass the expected number of
inputs. This simplifies the processing code a bit more

Use existing image pre-processing code a bit more. We can rely on the
existing image pre-processing code and keep the inference loop a bit
simpler.
@williamberman
Copy link
Contributor

Looks great! pushed a few nits mainly around re-using the existing pre-processing code in the inference loop a bit more a257ed5

@patrickvonplaten
Copy link
Contributor

Great job @takuma104 !

@patrickvonplaten patrickvonplaten merged commit d9b8adc into huggingface:main Mar 13, 2023
@takuma104
Copy link
Contributor Author

@patrickvonplaten Thank you for merging! As for my space for a demo, I finally think it has reached a certain level of quality, so I would like to release it. I would appreciate it if you could help me with promotion, etc. Since there have been many cases where the generated images were distorted with the vanilla SD model, I have replaced it with an anime model called dreamlike-art/dreamlike-anime-1.0.

@patrickvonplaten
Copy link
Contributor

Sounds good, I shared it on our discord :-) I think reddit is worth trying as well! https://www.reddit.com/r/StableDiffusion/

def prepare_image(self, image, width, height, batch_size, num_images_per_prompt, device, dtype):
def prepare_image(
self, image, width, height, batch_size, num_images_per_prompt, device, dtype, do_classifier_free_guidance
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do_classifier_free_guidance should have a default value to not break existing code that depends on StableDiffusionControlNetPipeline (like StableDiffusionControlNetInpaintPipeline)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Since this PR is already closed, could you please open a new PR for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

w4ffl35 pushed a commit to w4ffl35/diffusers that referenced this pull request Apr 14, 2023
…huggingface#2627)

* support for List[ControlNetModel] on init()

* Add to support for multiple ControlNetCondition

* rename conditioning_scale to scale

* scaling bugfix

* Manually merge `MultiControlNet` huggingface#2621

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* cleanups
- don't expose ControlNetCondition
- move scaling to ControlNetModel

* make style error correct

* remove ControlNetCondition to reduce code diff

* refactoring image/cond_scale

* add explain for `images`

* Add docstrings

* all fast-test passed

* Add a slow test

* nit

* Apply suggestions from code review

* small precision fix

* nits

MultiControlNet -> MultiControlNetModel - Matches existing naming a bit
closer

MultiControlNetModel inherit from model utils class - Don't have to
re-write fp16 test

Skip tests that save multi controlnet pipeline - Clearer than changing
test body

Don't auto-batch the number of input images to the number of controlnets.
We generally like to require the user to pass the expected number of
inputs. This simplifies the processing code a bit more

Use existing image pre-processing code a bit more. We can rely on the
existing image pre-processing code and keep the inference loop a bit
simpler.

---------

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: William Berman <[email protected]>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…huggingface#2627)

* support for List[ControlNetModel] on init()

* Add to support for multiple ControlNetCondition

* rename conditioning_scale to scale

* scaling bugfix

* Manually merge `MultiControlNet` huggingface#2621

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* cleanups
- don't expose ControlNetCondition
- move scaling to ControlNetModel

* make style error correct

* remove ControlNetCondition to reduce code diff

* refactoring image/cond_scale

* add explain for `images`

* Add docstrings

* all fast-test passed

* Add a slow test

* nit

* Apply suggestions from code review

* small precision fix

* nits

MultiControlNet -> MultiControlNetModel - Matches existing naming a bit
closer

MultiControlNetModel inherit from model utils class - Don't have to
re-write fp16 test

Skip tests that save multi controlnet pipeline - Clearer than changing
test body

Don't auto-batch the number of input images to the number of controlnets.
We generally like to require the user to pass the expected number of
inputs. This simplifies the processing code a bit more

Use existing image pre-processing code a bit more. We can rely on the
existing image pre-processing code and keep the inference loop a bit
simpler.

---------

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: William Berman <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…huggingface#2627)

* support for List[ControlNetModel] on init()

* Add to support for multiple ControlNetCondition

* rename conditioning_scale to scale

* scaling bugfix

* Manually merge `MultiControlNet` huggingface#2621

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: Patrick von Platen <[email protected]>

* cleanups
- don't expose ControlNetCondition
- move scaling to ControlNetModel

* make style error correct

* remove ControlNetCondition to reduce code diff

* refactoring image/cond_scale

* add explain for `images`

* Add docstrings

* all fast-test passed

* Add a slow test

* nit

* Apply suggestions from code review

* small precision fix

* nits

MultiControlNet -> MultiControlNetModel - Matches existing naming a bit
closer

MultiControlNetModel inherit from model utils class - Don't have to
re-write fp16 test

Skip tests that save multi controlnet pipeline - Clearer than changing
test body

Don't auto-batch the number of input images to the number of controlnets.
We generally like to require the user to pass the expected number of
inputs. This simplifies the processing code a bit more

Use existing image pre-processing code a bit more. We can rely on the
existing image pre-processing code and keep the inference loop a bit
simpler.

---------

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: William Berman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants