-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add support for Multi-ControlNet to StableDiffusionControlNetPipeline #2627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Multi-ControlNet to StableDiffusionControlNetPipeline #2627
Conversation
Co-authored-by: Patrick von Platen <[email protected]>
@patrickvonplaten @williamberman
|
The documentation is not available anymore as the PR was closed or merged. |
@takuma104 Here is the implementation of the |
@HimariO Great! Thanks for letting me know! Indeed, if I modify the MultiAdapter for ControlNet, it could be used as MultiControlNet. Is this code part of PR #2555? Once it gets merged, we can use the Sideload mechanism as well. I'll wait and see how the review process for #2555 goes. By the way, congratulations on getting out of Draft! |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
return_dict=False, | ||
) | ||
|
||
down_block_res_samples = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the clean-up here, but I think since we have to pay attention to this: https://github.com/huggingface/diffusers/pull/2627/files#r1132300008 we should maybe put all this directly in the ControlNetModel
forward call? Also see: https://github.com/huggingface/diffusers/pull/2627/files#r1132302783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice idea! I'll write it in that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d1acef4
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
# scaling | ||
down_samples = [sample * cond.scale for sample in down_samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be removed if we put the scale directly in the original ConditionUnet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d1acef4
@@ -470,67 +659,6 @@ def check_inputs( | |||
f" {negative_prompt_embeds.shape}." | |||
) | |||
|
|||
image_is_pil = isinstance(image, PIL.Image.Image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor looks good to me
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
@@ -143,6 +145,220 @@ def test_inference_batch_single_identical(self): | |||
self._test_inference_batch_single_identical(expected_max_diff=2e-3) | |||
|
|||
|
|||
class StableDiffusionMultiControlNetPipelineFastTests(PipelineTesterMixin, unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice tests!
# override PipelineTesterMixin | ||
@unittest.skip(reason="Not implemented") | ||
def test_save_load_optional_components(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe add one slow tests that verifies that using two controlnets on a real eaxmple (SDv1-5) works well? Think the test can look similar to this one but just with two controlnets:
assert np.abs(expected_image - image).max() < 5e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it in this commit. I saved the results with np.save
and temporarily placed them in my Huggingface-hub. I hope you can copy the .npy file to the appropriate location.
By the way, it seems that all the other slow tests are failing on my environment (Ubuntu, RTX3090). This seems to be the case even on the latest version of the main branch, so this modification is unlikely to be related. Short test summary is following:
================================================= short test summary info =================================================
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_canny - AssertionError: assert 0.01935801 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_depth - AssertionError: assert 0.42120677 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_hed - AssertionError: assert 0.054451108 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_mlsd - AssertionError: assert 0.03511852 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_normal - AssertionError: assert 0.011472225 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_openpose - AssertionError: assert 0.024894118 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_scribble - AssertionError: assert 0.5503108 < 0.005
FAILED tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py::StableDiffusionControlNetPipelineSlowTests::test_seg - AssertionError: assert 0.15960121 < 0.005
============================ 8 failed, 26 passed, 5 skipped, 13 warnings in 239.19s (0:03:59) =============================
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
@@ -594,6 +700,7 @@ def __call__( | |||
callback_steps: int = 1, | |||
cross_attention_kwargs: Optional[Dict[str, Any]] = None, | |||
controlnet_conditioning_scale: float = 1.0, | |||
controlnet_conditions: Optional[List[ControlNetCondition]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controlnet_conditions: Optional[List[ControlNetCondition]] = None, |
Would it be ok for you to just use the controlnet_conditions for internal usage of the pipeline?
I'm a bit worried about exposing a very new design module such as ControlNetCondition
to the main __init__.py
and also think the user shouldn't have to learn about a new concept when using multi controlnet, this just passing multiple images and controlnet_conditioning_scale should be good enough :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The benefits of ControlNetCondition may become apparent as more parameter extensions are added, but for now I'll keep it for internal use only and not expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d1acef4
@@ -671,13 +778,20 @@ def __call__( | |||
list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work" | |||
(nsfw) content, according to the `safety_checker`. | |||
""" | |||
|
|||
# TODO: add conversion image array to ControlNetConditions | |||
if controlnet_conditions is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if controlnet_conditions is None: |
Let's maybe always convert the image to controlnet conditionings to check internally, but wouldn't expose to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d1acef4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @takuma104,
This PR already looks to be in a great state!
Two things I think we need to handle:
1.) For single controlnet use cases, I don't think we can change the class that is saved to MultiControlNet
- see comment here: . This would disable a use case that is sometimes needed IMO
2.) I'd maybe suggest to not introduce a new concept (ControlNetCondition
) to the user and intstead just use it for internal handling. E.g. I'd only allow the following use case:
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose",
torch_dtype=torch.float16).to("cuda")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"example/a-sd15-variant-model", torch_dtype=torch.float16,
controlnet=[
controlnet_pose,
controlnet_canny
],
).to("cuda")
image = pipe(prompt='...',
image=[pose_image, canny_image],
controlnet_conditioning_scale=[1, 1.2],
).images
I think this is a bit easier to understand for the user and also keeps our public API a bit leaner.
3.) I think we can pass the conditioning scale logic actually directly into the forward pass of ControlNetModel
as this would simplify some code and also makes sense IMO.
Would this be ok for you? Does that make sense?
…sion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]>
…sion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]>
…sion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]>
After making various corrections, |
@@ -492,6 +493,10 @@ def forward( | |||
|
|||
mid_block_res_sample = self.controlnet_mid_block(sample) | |||
|
|||
# 6. scaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
return controlnet_conditioning_scale | ||
|
||
# override DiffusionPipeline | ||
def save_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very clean - great!
tests/pipelines/stable_diffusion/test_stable_diffusion_controlnet.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made sure that all the slow tests work - just updated the precision a bit :-)
Amazing work here @takuma104! From my side this is all good to merge. @williamberman could you maybe take a quick look as well ?
@takuma104, also if you'd like you could maybe create a quick space under your namespace: https://huggingface.co/takuma104 that shows how multi-controlnet works in action. It should be really simple to set up the space:
- 1.) Duplicate space from here: https://huggingface.co/spaces/diffusers/controlnet-openpose
- 2.) Change code slightly so that two controlnets are used (maybe canny + open-pose?)
- 3.) We can def assign you a free GPU for this space and try to advertise it a bit
@patrickvonplaten I think it's a bit difficult to automatically generate the combination of pose and canny, so we will ask the user to prepare them. If users can refer to examples, it should be sufficient as a demo. The prompt is a little vague with only "best quality, extremely detailed," so I am providing some guidance. I'm still not entirely satisfied with the output of the demo, so I'll try creating some control images that will likely yield better results with SD1.5 tomorrow. |
MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler.
Looks great! pushed a few nits mainly around re-using the existing pre-processing code in the inference loop a bit more a257ed5 |
Great job @takuma104 ! |
@patrickvonplaten Thank you for merging! As for my space for a demo, I finally think it has reached a certain level of quality, so I would like to release it. I would appreciate it if you could help me with promotion, etc. Since there have been many cases where the generated images were distorted with the vanilla SD model, I have replaced it with an anime model called dreamlike-art/dreamlike-anime-1.0. |
Sounds good, I shared it on our discord :-) I think reddit is worth trying as well! https://www.reddit.com/r/StableDiffusion/ |
def prepare_image(self, image, width, height, batch_size, num_images_per_prompt, device, dtype): | ||
def prepare_image( | ||
self, image, width, height, batch_size, num_images_per_prompt, device, dtype, do_classifier_free_guidance | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do_classifier_free_guidance
should have a default value to not break existing code that depends on StableDiffusionControlNetPipeline
(like StableDiffusionControlNetInpaintPipeline
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Since this PR is already closed, could you please open a new PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
…huggingface#2627) * support for List[ControlNetModel] on init() * Add to support for multiple ControlNetCondition * rename conditioning_scale to scale * scaling bugfix * Manually merge `MultiControlNet` huggingface#2621 Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * cleanups - don't expose ControlNetCondition - move scaling to ControlNetModel * make style error correct * remove ControlNetCondition to reduce code diff * refactoring image/cond_scale * add explain for `images` * Add docstrings * all fast-test passed * Add a slow test * nit * Apply suggestions from code review * small precision fix * nits MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler. --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: William Berman <[email protected]>
…huggingface#2627) * support for List[ControlNetModel] on init() * Add to support for multiple ControlNetCondition * rename conditioning_scale to scale * scaling bugfix * Manually merge `MultiControlNet` huggingface#2621 Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * cleanups - don't expose ControlNetCondition - move scaling to ControlNetModel * make style error correct * remove ControlNetCondition to reduce code diff * refactoring image/cond_scale * add explain for `images` * Add docstrings * all fast-test passed * Add a slow test * nit * Apply suggestions from code review * small precision fix * nits MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler. --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: William Berman <[email protected]>
…huggingface#2627) * support for List[ControlNetModel] on init() * Add to support for multiple ControlNetCondition * rename conditioning_scale to scale * scaling bugfix * Manually merge `MultiControlNet` huggingface#2621 Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py Co-authored-by: Patrick von Platen <[email protected]> * cleanups - don't expose ControlNetCondition - move scaling to ControlNetModel * make style error correct * remove ControlNetCondition to reduce code diff * refactoring image/cond_scale * add explain for `images` * Add docstrings * all fast-test passed * Add a slow test * nit * Apply suggestions from code review * small precision fix * nits MultiControlNet -> MultiControlNetModel - Matches existing naming a bit closer MultiControlNetModel inherit from model utils class - Don't have to re-write fp16 test Skip tests that save multi controlnet pipeline - Clearer than changing test body Don't auto-batch the number of input images to the number of controlnets. We generally like to require the user to pass the expected number of inputs. This simplifies the processing code a bit more Use existing image pre-processing code a bit more. We can rely on the existing image pre-processing code and keep the inference loop a bit simpler. --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: William Berman <[email protected]>
Discussed in #2556. This PR makes
StableDiffusionControlNetPipeline
compatible with multiple ControlNets (Multi-ControlNet). Currently, one ControlNet conditions one UNet, but it will extend this to condition multiple ControlNets on one UNet, enabling more advanced image generation control. For example, as shown in the code example, you can use canny and openpose simultaneously.Modification points:
ControlNetModel
array to thecontrolnet
argument on init().Created a newControlNetCondition
class and made it so that one is specified for each ControlNet processing. Conditional image preprocessing was also moved to this class.Usage Example:
Generate Example:
TODO:
image
argument@patrickvonplaten @williamberman