Add support for Multi-ControlNet to StableDiffusionControlNetPipeline #2627

takuma104 · 2023-03-09T16:29:18Z

Discussed in #2556. This PR makes StableDiffusionControlNetPipeline compatible with multiple ControlNets (Multi-ControlNet). Currently, one ControlNet conditions one UNet, but it will extend this to condition multiple ControlNets on one UNet, enabling more advanced image generation control. For example, as shown in the code example, you can use canny and openpose simultaneously.

Modification points:

Made it possible to specify the ControlNetModel array to the controlnet argument on init().
~~Created a new ControlNetCondition class and made it so that one is specified for each ControlNet processing. Conditional image preprocessing was also moved to this class.~~
Move down/mid output scaling to ControlNetModel

Usage Example:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
             image=[pose_image, canny_image],
        ).images[0]
image.save("output.png")

Generate Example:

Control Image1	Control Image2	Generated

	(none)
	(none)

TODO:

Support for multiple ControlNet condition image directly as an array to the image argument
Fix Docstring
Fix Fast-test

@patrickvonplaten @williamberman

Co-authored-by: Patrick von Platen <[email protected]>

takuma104 · 2023-03-09T16:32:37Z

@patrickvonplaten @williamberman
I have several questions and concerns at this point. Any advice would be appreciated.

For code simplification, I made it so that MultiControlNet is created even if only one ControlNet is specified. Will this cause any backward compatibility issues?
The image argument in __call__() already supports PIL.Image or Tensor arrays to accommodate multiple prompts or batches. Combining this with multiple ControlNets is causing me some confusion in terms of mapping. I don't have an immediate solution for this. I will continue to think about it.
The following fast tests are failing in my environment due to the onnxruntime-training module not being found during save_pretrained(). This issue only occurred after merging the MultiControlNet module. I haven't been able to install onnxruntime-training in my environment easily, so I have postponed this for now. I will check the results of the CI test later.

StableDiffusionControlNetPipelineFastTests::test_save_load_float16
StableDiffusionControlNetPipelineFastTests::test_save_load_local
StableDiffusionControlNetPipelineFastTests::test_save_load_optional_components

HuggingFaceDocBuilderDev · 2023-03-09T16:33:48Z

The documentation is not available anymore as the PR was closed or merged.

HimariO · 2023-03-09T16:53:17Z

@takuma104 Here is the implementation of the MultiAdapter, which is essentially the same as MultiControlNet by my understanding. I think my implementation can avoid the save_pretrained() issue you mentioned, and it also allows the user to load the entire pipeline that contains MultiAdapter/MultiControlNet with from_pretrained()

takuma104 · 2023-03-09T17:32:51Z

@HimariO Great! Thanks for letting me know! Indeed, if I modify the MultiAdapter for ControlNet, it could be used as MultiControlNet. Is this code part of PR #2555? Once it gets merged, we can use the Sideload mechanism as well. I'll wait and see how the review process for #2555 goes. By the way, congratulations on getting out of Draft!

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

patrickvonplaten · 2023-03-10T12:21:41Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

                    return_dict=False,
                )

-                down_block_res_samples = [


I like the clean-up here, but I think since we have to pay attention to this: https://github.com/huggingface/diffusers/pull/2627/files#r1132300008 we should maybe put all this directly in the ControlNetModel forward call? Also see: https://github.com/huggingface/diffusers/pull/2627/files#r1132302783

That's a nice idea! I'll write it in that direction.

Fixed in d1acef4

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

patrickvonplaten · 2023-03-10T12:26:31Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

+            )
+
+            # scaling
+            down_samples = [sample * cond.scale for sample in down_samples]


This could be removed if we put the scale directly in the original ConditionUnet

Fixed in d1acef4

patrickvonplaten · 2023-03-10T12:28:01Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

@@ -470,67 +659,6 @@ def check_inputs(
                    f" {negative_prompt_embeds.shape}."
                )

-        image_is_pil = isinstance(image, PIL.Image.Image)


Refactor looks good to me

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

patrickvonplaten · 2023-03-10T12:37:26Z

tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py

@@ -143,6 +145,220 @@ def test_inference_batch_single_identical(self):
        self._test_inference_batch_single_identical(expected_max_diff=2e-3)


+class StableDiffusionMultiControlNetPipelineFastTests(PipelineTesterMixin, unittest.TestCase):


nice tests!

patrickvonplaten · 2023-03-10T12:38:39Z

tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py

+    # override PipelineTesterMixin
+    @unittest.skip(reason="Not implemented")
+    def test_save_load_optional_components(self):
+        pass


Could we maybe add one slow tests that verifies that using two controlnets on a real eaxmple (SDv1-5) works well? Think the test can look similar to this one but just with two controlnets:

diffusers/tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py

Line 368 in d761b58

assert np.abs(expected_image - image).max() < 5e-3

I added it in this commit. I saved the results with np.save and temporarily placed them in my Huggingface-hub. I hope you can copy the .npy file to the appropriate location.

By the way, it seems that all the other slow tests are failing on my environment (Ubuntu, RTX3090). This seems to be the case even on the latest version of the main branch, so this modification is unlikely to be related. Short test summary is following:

================================================= short test summary info ================================================= FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_canny - AssertionError: assert 0.01935801 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_depth - AssertionError: assert 0.42120677 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_hed - AssertionError: assert 0.054451108 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_mlsd - AssertionError: assert 0.03511852 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_normal - AssertionError: assert 0.011472225 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_openpose - AssertionError: assert 0.024894118 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_scribble - AssertionError: assert 0.5503108 < 0.005 FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_seg - AssertionError: assert 0.15960121 < 0.005 ============================ 8 failed, 26 passed, 5 skipped, 13 warnings in 239.19s (0:03:59) =============================

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

patrickvonplaten · 2023-03-10T12:42:56Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

@@ -594,6 +700,7 @@ def __call__(
        callback_steps: int = 1,
        cross_attention_kwargs: Optional[Dict[str, Any]] = None,
        controlnet_conditioning_scale: float = 1.0,
+        controlnet_conditions: Optional[List[ControlNetCondition]] = None,


Suggested change

controlnet_conditions: Optional[List[ControlNetCondition]] = None,

Would it be ok for you to just use the controlnet_conditions for internal usage of the pipeline?

I'm a bit worried about exposing a very new design module such as ControlNetCondition to the main __init__.py and also think the user shouldn't have to learn about a new concept when using multi controlnet, this just passing multiple images and controlnet_conditioning_scale should be good enough :-)

Ok. The benefits of ControlNetCondition may become apparent as more parameter extensions are added, but for now I'll keep it for internal use only and not expose it.

Fixed in d1acef4

patrickvonplaten · 2023-03-10T12:43:27Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

@@ -671,13 +778,20 @@ def __call__(
            list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work"
            (nsfw) content, according to the `safety_checker`.
        """
+
+        # TODO: add conversion image array to ControlNetConditions
+        if controlnet_conditions is None:


Suggested change

if controlnet_conditions is None:

Let's maybe always convert the image to controlnet conditionings to check internally, but wouldn't expose to the user.

Fixed in d1acef4

patrickvonplaten

Hey @takuma104,

This PR already looks to be in a great state!

Two things I think we need to handle:

1.) For single controlnet use cases, I don't think we can change the class that is saved to MultiControlNet - see comment here: . This would disable a use case that is sometimes needed IMO
2.) I'd maybe suggest to not introduce a new concept (ControlNetCondition) to the user and intstead just use it for internal handling. E.g. I'd only allow the following use case:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
            image=[pose_image, canny_image],
            controlnet_conditioning_scale=[1, 1.2],
        ).images

I think this is a bit easier to understand for the user and also keeps our public API a bit leaner.

3.) I think we can pass the conditioning scale logic actually directly into the forward pass of ControlNetModel as this would simplify some code and also makes sense IMO.

Would this be ok for you? Does that make sense?

…sion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]>

takuma104 · 2023-03-10T18:52:07Z

After making various corrections, ControlNetCondition became unnecessary, so I delete it altogether. I think the code diff has been considerably reduced. All that's left is to fix docstrings and tests, I guess.

patrickvonplaten · 2023-03-13T15:42:16Z

src/diffusers/models/controlnet.py

@@ -492,6 +493,10 @@ def forward(

        mid_block_res_sample = self.controlnet_mid_block(sample)

+        # 6. scaling


patrickvonplaten · 2023-03-13T15:43:34Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

+            return controlnet_conditioning_scale
+
+    # override DiffusionPipeline
+    def save_pretrained(


That's very clean - great!

tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py

patrickvonplaten

Made sure that all the slow tests work - just updated the precision a bit :-)

Amazing work here @takuma104! From my side this is all good to merge. @williamberman could you maybe take a quick look as well ?

@takuma104, also if you'd like you could maybe create a quick space under your namespace: https://huggingface.co/takuma104 that shows how multi-controlnet works in action. It should be really simple to set up the space:

1.) Duplicate space from here: https://huggingface.co/spaces/diffusers/controlnet-openpose
2.) Change code slightly so that two controlnets are used (maybe canny + open-pose?)
3.) We can def assign you a free GPU for this space and try to advertise it a bit

takuma104 · 2023-03-13T18:00:00Z

@patrickvonplaten
Thanks! I created a space easily with copy and paste (it's easier than I thought).
https://huggingface.co/spaces/takuma104/multi-controlnet

I think it's a bit difficult to automatically generate the combination of pose and canny, so we will ask the user to prepare them. If users can refer to examples, it should be sufficient as a demo.

The prompt is a little vague with only "best quality, extremely detailed," so I am providing some guidance. I'm still not entirely satisfied with the output of the demo, so I'll try creating some control images that will likely yield better results with SD1.5 tomorrow.

MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler.

williamberman · 2023-03-13T19:45:42Z

Looks great! pushed a few nits mainly around re-using the existing pre-processing code in the inference loop a bit more a257ed5

patrickvonplaten · 2023-03-13T20:18:27Z

Great job @takuma104 !

takuma104 · 2023-03-15T18:15:33Z

@patrickvonplaten Thank you for merging! As for my space for a demo, I finally think it has reached a certain level of quality, so I would like to release it. I would appreciate it if you could help me with promotion, etc. Since there have been many cases where the generated images were distorted with the vanilla SD model, I have replaced it with an anime model called dreamlike-art/dreamlike-anime-1.0.

patrickvonplaten · 2023-03-16T15:12:38Z

Sounds good, I shared it on our discord :-) I think reddit is worth trying as well! https://www.reddit.com/r/StableDiffusion/

remorses · 2023-04-13T18:47:14Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

-    def prepare_image(self, image, width, height, batch_size, num_images_per_prompt, device, dtype):
+    def prepare_image(
+        self, image, width, height, batch_size, num_images_per_prompt, device, dtype, do_classifier_free_guidance
+    ):


do_classifier_free_guidance should have a default value to not break existing code that depends on StableDiffusionControlNetPipeline (like StableDiffusionControlNetInpaintPipeline)

Indeed! Since this PR is already closed, could you please open a new PR for it?

…huggingface#2627) * support for List[ControlNetModel] on init() * Add to support for multiple ControlNetCondition * rename conditioning_scale to scale * scaling bugfix * Manually merge `MultiControlNet` huggingface#2621 Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * cleanups - don't expose ControlNetCondition - move scaling to ControlNetModel * make style error correct * remove ControlNetCondition to reduce code diff * refactoring image/cond_scale * add explain for `images` * Add docstrings * all fast-test passed * Add a slow test * nit * Apply suggestions from code review * small precision fix * nits MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler. --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: William Berman <[email protected]>

takuma104 and others added 6 commits March 9, 2023 00:40

support for List[ControlNetModel] on init()

52b7c69

Add to support for multiple ControlNetCondition

95e4a12

rename conditioning_scale to scale

3c2fd98

scaling bugfix

aae2fc1

Manually merge MultiControlNet huggingface#2621

b14b304

Co-authored-by: Patrick von Platen <[email protected]>

Merge branch 'huggingface:main' into multi-controlnet-ext

6b38c3a

takuma104 changed the title ~~Add support Multi-Controlnet to StableDiffusionControlNetPipeline~~ Add support Multi-ControlNet to StableDiffusionControlNetPipeline Mar 9, 2023

takuma104 changed the title ~~Add support Multi-ControlNet to StableDiffusionControlNetPipeline~~ Add support for Multi-ControlNet to StableDiffusionControlNetPipeline Mar 9, 2023

patrickvonplaten reviewed Mar 10, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Mar 10, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Show resolved Hide resolved