Add Multi-ControlNet pipeline #2556

takuma104 · 2023-03-05T07:24:04Z

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated.
https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows.
takuma104/diffusers@1b0f135...multi_controlnet

Modification points:

Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.
Made it so that controlnet is not specified in the Pipeline constructor.
Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

Please refer to the main() function in the py file for details.

pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")

Generate Example:

Model: andite/anything-v4.0
Prompt: best quality, extremely detailed, cowboy shot
Negative_prompt: cowboy, monochrome, lowres, bad anatomy, worst quality, low quality
Seed: 19 (cherry-picked)
Pose & Canny Control Image Generated by Character bones that look like Openpose for blender _ Ver_4.7 Depth+Canny

Control Image1	Control Image2	Generated

	(none)
	(none)

The text was updated successfully, but these errors were encountered:

mikegarts · 2023-03-05T14:04:22Z

Actually this one looks even better than the StableDiffusionControlNetPipeline you recently merged, imho.

becausecurious · 2023-03-05T16:41:06Z

Is ControlNetProcessor.conditioning_scale same as ControlNet weight in AUTOMATIC1111 implementation of Multi-ControlNet?

takuma104 · 2023-03-06T14:26:26Z

@mikegarts At the very end of the PR, there was a major API change. ControlNet files became an independent distribution rather than being distributed with Stable Diffusion pipeline files. I wish I had thought of this when I saw it, but unfortunately, I didn't come up with this idea at the time.

@becausecurious conditioning_scale is also in the arguments of StableDiffusionControlNetPipeline's __call()__ method, and I just ported it over. It probably works the same way as A1111 (more precisely, the Mikubill extension).

takuma104 · 2023-03-06T18:45:17Z

I replaced the generated samples with more meaningful controls.
Code: https://huggingface.co/takuma104/controlnet_dev/blob/main/multi_controlnet/README.md

patrickvonplaten · 2023-03-07T12:58:46Z

Hey @takuma104,

Thanks a lot for the write-up - that's great!
Generally, I think the design is nice, it's just that we cannot rename things anymore because it would break backwards compatibility, so we have to stick with controlnet and image.

But your overall design looks very nice, I think we should allow the user to just pass a list of controlnets, i.e. changing:

controlnet: Union[ControlNetModel, List[ControlNetModel]]

at init. (The only problem is that this pipeline could then not be loaded saved at the moment, but this is not very important and can be changed down the road.)

Then we can just pass a list of images:

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        image=[canny_left, pose_right],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]

instead of passing the whole model to pipe. I'm not a big fan of passing the whole model to the pipeline's call function
as it makes everything hard to debug, read (we're then essentially passing a function to a function which is not very pytorch-y)

Given that our image processors can easily process a list of images, I don't think we have to change any code really.

cc @williamberman could you also maybe help here?

takuma104 · 2023-03-07T16:36:48Z

Hi @patrickvonplaten , Thanks for your feedback! I hadn't thought of your approach. Indeed, with this approach, we can extend StableDiffusionControlNetPipeline to Multi-ControlNet without breaking its compatibility. It's better than adding another community pipeline for no reason.

I think it should be fine for most use cases. If there is a weakness, it might be the following:

conditioning_scale: also needs to support arrays (The default is 1.0 for multiple ControlNets in common, but I would like to specify them separately as well.).
If we add more parameters in the future, we will need to support arrays as well, and it may become difficult for users to handle.

Regarding parameters: Since I was originally thinking about it in terms of the community pipeline, I also considered adding some more parameters supported by the Mikubill extension. I haven't written the implementation for those two yet, and I don't fully understand how effective they are. So, just for your information.

Guess Mode: a mode that allows to skip the input of a prompt.
Guidance Start & End: specify the timestep range to perform the ControlNetModel inference.

First, I'll try to create a new diff for this approach.

takuma104 · 2023-03-07T17:37:42Z

I tried it a bit, but allowing a list of modules as an argument for DiffusionPipeline.register_modules() seems like it would be quite a wide-ranging change. I don't think it's a very good idea. Instead, should we make it so that it's okay to not register the module with register_module()?

takuma104 · 2023-03-08T17:39:48Z

I created a proposal for making StableDiffusionControlNetPipeline compatible with Multi-ControlNet.
main...takuma104:diffusers:multi-controlnet-ext

Adding multiple controlnets on init() was very frustrating because I had to use a very hacky method. It feels like there might still be some pitfalls.

The fast-tests for the previous version and the additional tests for Multi-ControlNet passed. However, some of the PipelineTesterMixin tests for the additional tests did not pass, so they need to be rewritten. The generated test is not yet sufficient, but the results of the generation are consistent with those created previously.

To improve the extensibility of the conditions, I prepared a class called ControlNetCondition for parameters (reusing the remains of the ControlNetProcessor). The advantage is that you can specify the image and scale as a pair, as shown in the following example.

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, ControlNetCondition

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"andite/anything-v4.0", safety_checker=None, torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
             controlnet_conditions=[
                 ControlNetCondition(pose_image),
                 ControlNetCondition(canny_image, scale=1.2),
             ],
        ).images[0]
image.save("output.png")

patrickvonplaten · 2023-03-09T11:46:07Z

Hey @takuma104,

Thanks a lot for playing around with the proposed design. It's indeed not as straight-forward - I've made a draft here that I think would be a nice design and solve our use-case quite clearly:
#2621

Would you maybe like to copy-paste the PR and try to get a PR ready with this design? Think this should work quite nicely :-)

patrickvonplaten · 2023-03-09T11:46:26Z

Also happy to introduce a scale parameter to the ControlNet in a follow-up PR :-)

patrickvonplaten · 2023-03-09T11:49:50Z

Also cc @williamberman can you help @takuma104 here ?

takuma104 · 2023-03-09T12:48:12Z

@patrickvonplaten Wow, this is amazing! I didn't think of this method. I feel like a weight has been lifted off my shoulders. I'll manually merge and then open the draft PR.

hosseinsarshar · 2023-04-19T06:40:21Z

@takuma104 loved your idea and the implementation.
Qq, I wonder why you didn't stick to your original design to separate the CNET processors with the base SD model? With that one could simply change the pipeline on the fly at inference time instead of tying Cnet processors to the SD model at the load time. Did you face any performance issues by the original design or was the change driven by a design decision?

takuma104 · 2023-04-19T13:09:50Z

Hi @classicboyir , Originally, I was considering creating a new Community Pipeline, so my thinking was relatively flexible. However, since we ended up modifying the main part, we maintained compatibility with the already released API before supporting Multi-ControlNet. It's not a performance issue.

I agree with wanting to change the controlnet(s) in the existing pipe. How about adding a new method like the following?

pipe.set_controlnet([new_some_controlnet1, new_some_controlnet2])

hosseinsarshar · 2023-04-20T03:43:19Z

Makes sense, thanks @takuma104.

This still seems that it needs to update the main pipe object. But your original design made it super easy to bring in controlnet models into the pipeline at the inference time in a fully stateless manner.

pipe.set_controlnet([new_some_controlnet1, new_some_controlnet2])

@patrickvonplaten, I understand that this might create backward compatibility issues, but improves the usability of controlnet models. In the current design, you have to deal with the (sometimes) bulky load operation everytime you want to use a different controlnet model in the pipeline but with @takuma104 's original design, you could change it with no penalty in a stateless way. Now I wonder if both designs could co-exist at the beginning so it doesn't break the existing pipelines, yet allows for flexibility? Later the previous design can start to retire with new releases.

patrickvonplaten · 2023-04-21T17:31:34Z

Hey @classicboyir,

It should be quite easy to change the controlnet still at inference time as follows:

from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_controlnet import MultiControlNetModel

pipe.controlnet = MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])

Does this work for your use case?

patrickvonplaten · 2023-04-21T17:32:01Z

Can you tell me which use case is difficult to support right now?

hosseinsarshar · 2023-04-23T01:30:35Z

@patrickvonplaten thanks for the note. I was not aware of this. With this, it covers the use-case I described above.

andysingal · 2023-07-24T04:20:16Z

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated. https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows. takuma104/[email protected]_controlnet

Modification points:

Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.

Made it so that controlnet is not specified in the Pipeline constructor.

Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

Please refer to the main() function in the py file for details.
pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")
Generate Example:

Model: andite/anything-v4.0

Prompt: best quality, extremely detailed, cowboy shot

Negative_prompt: cowboy, monochrome, lowres, bad anatomy, worst quality, low quality

Seed: 19 (cherry-picked)

Pose & Canny Control Image Generated by Character bones that look like Openpose for blender _ Ver_4.7 Depth+Canny

Control Image1 Control Image2 Generated

(none)
(none)

How to convert several tensors at the same time using controlnet to diffuser scription, Assume we have controlnet extensions as https://www.kaggle.com/datasets/azharzen/controlnet-ext. What is the best practice @takuma104 @patrickvonplaten @williamberman @sayakpaul

henbucuoshanghai · 2023-08-16T08:23:19Z

StableDiffusionMultiControlNetPipeline ? where is the function?

wilfrediscoming · 2023-09-20T10:31:24Z

Hey @classicboyir,

It should be quite easy to change the controlnet still at inference time as follows:
from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_controlnet import MultiControlNetModel

pipe.controlnet = MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])
Does this work for your use case?

If I am using 2 control net by default like this

multi_controlnet = [
    ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_softedge", torch_dtype=torch.float16),
    ControlNetModel.from_pretrained("lllyasviel/control_v11e_sd15_canny", torch_dtype=torch.float16), # for shuffle
]

Then for some pipeline I want to only use 1, I'll do

pipe.controlnet = multi_controlnet[0]

But then how do I switch it back to the original state (i.e. using the both controlnet?)

I tried below and it doesn't work

pipe.controlnet = multi_controlnet

wilfrediscoming · 2023-09-20T10:48:56Z

MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])

i must be out of my mind.. MultiControlNetModel([new_some_controlnet1, new_some_controlnet2]) already solve the problem.. Thanks a lot!

tval2 · 2023-11-11T20:06:50Z

Maybe a dumb question but I can't seem to find good tutorials of Mulit-ControlNets. I'm trying to train a model that takes in 2 images and a prompt:

a template base image (e.g. a photo of a room in someone's house with a painting on the wall)
a photo of a painting someone made (e.g. not a famous one like a Van Gogh, just someone's painting)
an optional text prompt describing the 2nd image...may not be necessary but curious what people here say

And I want to place image2 in image1 to replace the painting on the wall with the new one. Is this the right forum / model to use? I thought maybe creating a custom dataset and then simply feeding 2 image controls in would do the job but really could use some experts' guidance here.

AngelTs · 2024-02-09T12:53:39Z

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated. https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows. takuma104/[email protected]_controlnet

Modification points:

Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.

Made it so that controlnet is not specified in the Pipeline constructor.

Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

Please refer to the main() function in the py file for details.
pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")
Generate Example:

Model: andite/anything-v4.0

Prompt: best quality, extremely detailed, cowboy shot

Negative_prompt: cowboy, monochrome, lowres, bad anatomy, worst quality, low quality

Seed: 19 (cherry-picked)

Pose & Canny Control Image Generated by Character bones that look like Openpose for blender _ Ver_4.7 Depth+Canny

Control Image1 Control Image2 Generated

(none)
(none)

ModuleNotFoundError: No module named 'stable_diffusion_multi_controlnet'

t00350320 · 2024-03-16T13:14:05Z

hi all experts,
how to design the multiControlNet workflow if there are two persons with different pose styles and text prompts?
thanks very much!!!

Qiuhao-Wu · 2024-09-25T04:02:35Z

how can I train a multi-controlnet based on by datasets

asomoza · 2024-09-25T05:21:56Z

Hi, this issue was about something else though. I think you're referring to a multi condition controlnet, we don't have a script for that but you can refer to the original paper and repository: Uni-ControlNet

patrickvonplaten assigned williamberman Mar 7, 2023

takuma104 mentioned this issue Mar 9, 2023

Add support for Multi-ControlNet to StableDiffusionControlNetPipeline #2627

Merged

3 tasks

sayakpaul mentioned this issue Mar 14, 2023

Multi controlnet draft #2621

Closed

williamberman closed this as completed Mar 16, 2023

5shekel mentioned this issue Sep 24, 2023

error importing is_compiled_module tencent-ailab/IP-Adapter#73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multi-ControlNet pipeline #2556

Add Multi-ControlNet pipeline #2556

takuma104 commented Mar 5, 2023 •

edited

Loading

mikegarts commented Mar 5, 2023

becausecurious commented Mar 5, 2023

takuma104 commented Mar 6, 2023 •

edited

Loading

takuma104 commented Mar 6, 2023

patrickvonplaten commented Mar 7, 2023

takuma104 commented Mar 7, 2023 •

edited

Loading

takuma104 commented Mar 7, 2023

takuma104 commented Mar 8, 2023 •

edited

Loading

patrickvonplaten commented Mar 9, 2023

patrickvonplaten commented Mar 9, 2023

patrickvonplaten commented Mar 9, 2023

takuma104 commented Mar 9, 2023

hosseinsarshar commented Apr 19, 2023

takuma104 commented Apr 19, 2023

hosseinsarshar commented Apr 20, 2023 •

edited

Loading

patrickvonplaten commented Apr 21, 2023

patrickvonplaten commented Apr 21, 2023

hosseinsarshar commented Apr 23, 2023

andysingal commented Jul 24, 2023 •

edited

Loading

Modification points:

Usage Example:

Generate Example:

henbucuoshanghai commented Aug 16, 2023

wilfrediscoming commented Sep 20, 2023 •

edited

Loading

wilfrediscoming commented Sep 20, 2023

tval2 commented Nov 11, 2023 •

edited

Loading

AngelTs commented Feb 9, 2024

Modification points:

Usage Example:

Generate Example:

t00350320 commented Mar 16, 2024

Qiuhao-Wu commented Sep 25, 2024

asomoza commented Sep 25, 2024

Add Multi-ControlNet pipeline #2556

Add Multi-ControlNet pipeline #2556

Comments

takuma104 commented Mar 5, 2023 • edited Loading

Modification points:

Usage Example:

Generate Example:

mikegarts commented Mar 5, 2023

becausecurious commented Mar 5, 2023

takuma104 commented Mar 6, 2023 • edited Loading

takuma104 commented Mar 6, 2023

patrickvonplaten commented Mar 7, 2023

takuma104 commented Mar 7, 2023 • edited Loading

takuma104 commented Mar 7, 2023

takuma104 commented Mar 8, 2023 • edited Loading

patrickvonplaten commented Mar 9, 2023

patrickvonplaten commented Mar 9, 2023

patrickvonplaten commented Mar 9, 2023

takuma104 commented Mar 9, 2023

hosseinsarshar commented Apr 19, 2023

takuma104 commented Apr 19, 2023

hosseinsarshar commented Apr 20, 2023 • edited Loading

patrickvonplaten commented Apr 21, 2023

patrickvonplaten commented Apr 21, 2023

hosseinsarshar commented Apr 23, 2023

andysingal commented Jul 24, 2023 • edited Loading

Modification points:

Usage Example:

Generate Example:

henbucuoshanghai commented Aug 16, 2023

wilfrediscoming commented Sep 20, 2023 • edited Loading

wilfrediscoming commented Sep 20, 2023

tval2 commented Nov 11, 2023 • edited Loading

AngelTs commented Feb 9, 2024

Modification points:

Usage Example:

Generate Example:

t00350320 commented Mar 16, 2024

Qiuhao-Wu commented Sep 25, 2024

asomoza commented Sep 25, 2024

takuma104 commented Mar 5, 2023 •

edited

Loading

takuma104 commented Mar 6, 2023 •

edited

Loading

takuma104 commented Mar 7, 2023 •

edited

Loading

takuma104 commented Mar 8, 2023 •

edited

Loading

hosseinsarshar commented Apr 20, 2023 •

edited

Loading

andysingal commented Jul 24, 2023 •

edited

Loading

wilfrediscoming commented Sep 20, 2023 •

edited

Loading

tval2 commented Nov 11, 2023 •

edited

Loading