Skip to content

Add Multi-ControlNet pipeline #2556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
takuma104 opened this issue Mar 5, 2023 · 27 comments
Closed

Add Multi-ControlNet pipeline #2556

takuma104 opened this issue Mar 5, 2023 · 27 comments
Assignees

Comments

@takuma104
Copy link
Contributor

takuma104 commented Mar 5, 2023

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated.
https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows.
takuma104/diffusers@1b0f135...multi_controlnet

Modification points:

  • Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.
  • Made it so that controlnet is not specified in the Pipeline constructor.
  • Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

  • Please refer to the main() function in the py file for details.
pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")

Generate Example:

Control Image1 Control Image2 Generated
(none)
(none)
@mikegarts
Copy link
Contributor

Actually this one looks even better than the StableDiffusionControlNetPipeline you recently merged, imho.

@becausecurious
Copy link

Is ControlNetProcessor.conditioning_scale same as ControlNet weight in AUTOMATIC1111 implementation of Multi-ControlNet?

@takuma104
Copy link
Contributor Author

takuma104 commented Mar 6, 2023

@mikegarts At the very end of the PR, there was a major API change. ControlNet files became an independent distribution rather than being distributed with Stable Diffusion pipeline files. I wish I had thought of this when I saw it, but unfortunately, I didn't come up with this idea at the time.

@becausecurious conditioning_scale is also in the arguments of StableDiffusionControlNetPipeline's __call()__ method, and I just ported it over. It probably works the same way as A1111 (more precisely, the Mikubill extension).

@takuma104
Copy link
Contributor Author

I replaced the generated samples with more meaningful controls.
Code: https://huggingface.co/takuma104/controlnet_dev/blob/main/multi_controlnet/README.md

@patrickvonplaten
Copy link
Contributor

Hey @takuma104,

Thanks a lot for the write-up - that's great!
Generally, I think the design is nice, it's just that we cannot rename things anymore because it would break backwards compatibility, so we have to stick with controlnet and image.

But your overall design looks very nice, I think we should allow the user to just pass a list of controlnets, i.e. changing:

controlnet: Union[ControlNetModel, List[ControlNetModel]]

at init. (The only problem is that this pipeline could then not be loaded saved at the moment, but this is not very important and can be changed down the road.)

Then we can just pass a list of images:

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        image=[canny_left, pose_right],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]

instead of passing the whole model to pipe. I'm not a big fan of passing the whole model to the pipeline's call function
as it makes everything hard to debug, read (we're then essentially passing a function to a function which is not very pytorch-y)

Given that our image processors can easily process a list of images, I don't think we have to change any code really.

cc @williamberman could you also maybe help here?

@takuma104
Copy link
Contributor Author

takuma104 commented Mar 7, 2023

Hi @patrickvonplaten , Thanks for your feedback! I hadn't thought of your approach. Indeed, with this approach, we can extend StableDiffusionControlNetPipeline to Multi-ControlNet without breaking its compatibility. It's better than adding another community pipeline for no reason.

I think it should be fine for most use cases. If there is a weakness, it might be the following:

  • conditioning_scale: also needs to support arrays (The default is 1.0 for multiple ControlNets in common, but I would like to specify them separately as well.).
  • If we add more parameters in the future, we will need to support arrays as well, and it may become difficult for users to handle.

Regarding parameters: Since I was originally thinking about it in terms of the community pipeline, I also considered adding some more parameters supported by the Mikubill extension. I haven't written the implementation for those two yet, and I don't fully understand how effective they are. So, just for your information.

  • Guess Mode: a mode that allows to skip the input of a prompt.
  • Guidance Start & End: specify the timestep range to perform the ControlNetModel inference.

First, I'll try to create a new diff for this approach.

@takuma104
Copy link
Contributor Author

I tried it a bit, but allowing a list of modules as an argument for DiffusionPipeline.register_modules() seems like it would be quite a wide-ranging change. I don't think it's a very good idea. Instead, should we make it so that it's okay to not register the module with register_module()?

@takuma104
Copy link
Contributor Author

takuma104 commented Mar 8, 2023

I created a proposal for making StableDiffusionControlNetPipeline compatible with Multi-ControlNet.
main...takuma104:diffusers:multi-controlnet-ext

Adding multiple controlnets on init() was very frustrating because I had to use a very hacky method. It feels like there might still be some pitfalls.

The fast-tests for the previous version and the additional tests for Multi-ControlNet passed. However, some of the PipelineTesterMixin tests for the additional tests did not pass, so they need to be rewritten. The generated test is not yet sufficient, but the results of the generation are consistent with those created previously.

To improve the extensibility of the conditions, I prepared a class called ControlNetCondition for parameters (reusing the remains of the ControlNetProcessor). The advantage is that you can specify the image and scale as a pair, as shown in the following example.

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, ControlNetCondition

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"andite/anything-v4.0", safety_checker=None, torch_dtype=torch.float16,
	controlnet=[
		controlnet_pose, 
		controlnet_canny
	],
).to("cuda")

image = pipe(prompt='...',
             controlnet_conditions=[
                 ControlNetCondition(pose_image),
                 ControlNetCondition(canny_image, scale=1.2),
             ],
        ).images[0]
image.save("output.png")

@patrickvonplaten
Copy link
Contributor

Hey @takuma104,

Thanks a lot for playing around with the proposed design. It's indeed not as straight-forward - I've made a draft here that I think would be a nice design and solve our use-case quite clearly:
#2621

Would you maybe like to copy-paste the PR and try to get a PR ready with this design? Think this should work quite nicely :-)

@patrickvonplaten
Copy link
Contributor

Also happy to introduce a scale parameter to the ControlNet in a follow-up PR :-)

@patrickvonplaten
Copy link
Contributor

Also cc @williamberman can you help @takuma104 here ?

@takuma104
Copy link
Contributor Author

@patrickvonplaten Wow, this is amazing! I didn't think of this method. I feel like a weight has been lifted off my shoulders. I'll manually merge and then open the draft PR.

@hosseinsarshar
Copy link

@takuma104 loved your idea and the implementation.
Qq, I wonder why you didn't stick to your original design to separate the CNET processors with the base SD model? With that one could simply change the pipeline on the fly at inference time instead of tying Cnet processors to the SD model at the load time. Did you face any performance issues by the original design or was the change driven by a design decision?

@takuma104
Copy link
Contributor Author

Hi @classicboyir , Originally, I was considering creating a new Community Pipeline, so my thinking was relatively flexible. However, since we ended up modifying the main part, we maintained compatibility with the already released API before supporting Multi-ControlNet. It's not a performance issue.

I agree with wanting to change the controlnet(s) in the existing pipe. How about adding a new method like the following?

pipe.set_controlnet([new_some_controlnet1, new_some_controlnet2])

@hosseinsarshar
Copy link

hosseinsarshar commented Apr 20, 2023

Makes sense, thanks @takuma104.

This still seems that it needs to update the main pipe object. But your original design made it super easy to bring in controlnet models into the pipeline at the inference time in a fully stateless manner.

pipe.set_controlnet([new_some_controlnet1, new_some_controlnet2])

@patrickvonplaten, I understand that this might create backward compatibility issues, but improves the usability of controlnet models. In the current design, you have to deal with the (sometimes) bulky load operation everytime you want to use a different controlnet model in the pipeline but with @takuma104 's original design, you could change it with no penalty in a stateless way. Now I wonder if both designs could co-exist at the beginning so it doesn't break the existing pipelines, yet allows for flexibility? Later the previous design can start to retire with new releases.

@patrickvonplaten
Copy link
Contributor

Hey @classicboyir,

It should be quite easy to change the controlnet still at inference time as follows:

from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_controlnet import MultiControlNetModel

pipe.controlnet = MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])

Does this work for your use case?

@patrickvonplaten
Copy link
Contributor

Can you tell me which use case is difficult to support right now?

@hosseinsarshar
Copy link

@patrickvonplaten thanks for the note. I was not aware of this. With this, it covers the use-case I described above.

@andysingal
Copy link

andysingal commented Jul 24, 2023

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated. https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows. takuma104/[email protected]_controlnet

Modification points:

  • Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.
  • Made it so that controlnet is not specified in the Pipeline constructor.
  • Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

  • Please refer to the main() function in the py file for details.
pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")

Generate Example:

Control Image1 Control Image2 Generated

(none)
(none)

How to convert several tensors at the same time using controlnet to diffuser scription, Assume we have controlnet extensions as https://www.kaggle.com/datasets/azharzen/controlnet-ext. What is the best practice @takuma104 @patrickvonplaten @williamberman @sayakpaul

@henbucuoshanghai
Copy link

StableDiffusionMultiControlNetPipeline ? where is the function?

@wilfrediscoming
Copy link

wilfrediscoming commented Sep 20, 2023

Hey @classicboyir,

It should be quite easy to change the controlnet still at inference time as follows:

from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_controlnet import MultiControlNetModel

pipe.controlnet = MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])

Does this work for your use case?

If I am using 2 control net by default like this

multi_controlnet = [
    ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_softedge", torch_dtype=torch.float16),
    ControlNetModel.from_pretrained("lllyasviel/control_v11e_sd15_canny", torch_dtype=torch.float16), # for shuffle
]

Then for some pipeline I want to only use 1, I'll do

pipe.controlnet = multi_controlnet[0]

But then how do I switch it back to the original state (i.e. using the both controlnet?)

I tried below and it doesn't work

pipe.controlnet = multi_controlnet

@wilfrediscoming
Copy link

MultiControlNetModel([new_some_controlnet1, new_some_controlnet2])

i must be out of my mind.. MultiControlNetModel([new_some_controlnet1, new_some_controlnet2]) already solve the problem.. Thanks a lot!

@tval2
Copy link

tval2 commented Nov 11, 2023

Maybe a dumb question but I can't seem to find good tutorials of Mulit-ControlNets. I'm trying to train a model that takes in 2 images and a prompt:

  1. a template base image (e.g. a photo of a room in someone's house with a painting on the wall)
  2. a photo of a painting someone made (e.g. not a famous one like a Van Gogh, just someone's painting)
  3. an optional text prompt describing the 2nd image...may not be necessary but curious what people here say

And I want to place image2 in image1 to replace the painting on the wall with the new one. Is this the right forum / model to use? I thought maybe creating a custom dataset and then simply feeding 2 image controls in would do the job but really could use some experts' guidance here.

@AngelTs
Copy link

AngelTs commented Feb 9, 2024

Discussed in #2331. I created a PoC that supports multiple ControlNets, called Multi-ControlNet, based on the StableDiffusionControlNetPipeline. Any feedback is appreciated. https://github.com/takuma104/diffusers/tree/multi_controlnet

The idea of using multiple ControlNets and adding their outputs together was proposed in #2331, but at the time, the name Multi-ControlNet was not common. For other implementations, I'm referring to @Mikubill 's pioneering work in this field, sd-webui-controlnet.

Currently, I'm considering opening a PR as a community pipeline and have placed the files in example/community.

The difference from pipeline_stable_diffusion_controlnet.py is as follows. takuma104/[email protected]_controlnet

Modification points:

  • Created a new ControlNetProcessor class and made it so that one is specified for each ControlNet processing. Image preprocessing was also moved here.
  • Made it so that controlnet is not specified in the Pipeline constructor.
  • Made it possible to specify multiple ControlNetProcessors in pipeline's __call__() method (there is no limit to the number).

Usage Example:

  • Please refer to the main() function in the py file for details.
pipe = StableDiffusionMultiControlNetPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5", safety_checker=None, torch_dtype=torch.float16
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16).to("cuda")

canny_left = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_left.png")
pose_right = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/pose_right.png")

image = pipe(
        prompt="best quality, extremely detailed",
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
        processors=[
            ControlNetProcessor(controlnet_canny, canny_left),
            ControlNetProcessor(controlnet_pose, pose_right),
        ],
        generator=torch.Generator(device="cpu").manual_seed(0),
        num_inference_steps=30,
        width=512,
        height=512,
).images[0]
image.save("canny_left_right.png")

Generate Example:

Control Image1 Control Image2 Generated

(none)
(none)

ModuleNotFoundError: No module named 'stable_diffusion_multi_controlnet'

@t00350320
Copy link

hi all experts,
how to design the multiControlNet workflow if there are two persons with different pose styles and text prompts?
thanks very much!!!

@Qiuhao-Wu
Copy link

how can I train a multi-controlnet based on by datasets

@asomoza
Copy link
Member

asomoza commented Sep 25, 2024

Hi, this issue was about something else though. I think you're referring to a multi condition controlnet, we don't have a script for that but you can refer to the original paper and repository: Uni-ControlNet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests