Skip to content

[Community] reference only control #3435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

okotaku
Copy link
Contributor

@okotaku okotaku commented May 15, 2023

Refer to Mikubill/sd-webui-controlnet#1236

Reference Image

reference_image

Output Image of reference_attn=True and reference_adain=False

output_image

Output Image of reference_attn=False and reference_adain=True

output_image

Output Image of reference_attn=True and reference_adain=True

output_image

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 15, 2023

The documentation is not available anymore as the PR was closed or merged.

@jfischoff
Copy link

jfischoff commented May 16, 2023

Should we include the group norm modification in this PR as well: https://github.com/Mikubill/sd-webui-controlnet/pull/1278/files#diff-8c8d004eed5a3078434f6fbde15c178e472565ebfcb3119f308f9292c8eb7514R458 ?

Right now using the group norm or reference_adain+attn gives the best results for reference only. I would definitely like to see this get added either in this PR or a subsequent one.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me! Thanks!

@patrickvonplaten
Copy link
Contributor

Think there is one merge conflict that we need to resolve & then we can get this one merged :-)

@kadirnar
Copy link
Contributor

Will you add controlnet support? How can we use it with controlnet? @okotaku

@wangdong-ivymobile
Copy link

It doesn't work for me for EulerAncestralDiscreteScheduler

@Laidawang
Copy link

When i run the example i get the error:
AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

@wangdong-ivymobile
Copy link

When i run the example i get the error: AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

You can't use the version from PyPi, you need to pull master branch and install from local (that or wait for next release).

@Laidawang
Copy link

When i run the example i get the error: AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

You can't use the version from PyPi, you need to pull master branch and install from local (that or wait for next release).

thanks, i see

@okotaku
Copy link
Contributor Author

okotaku commented May 18, 2023

@kadirnar I will add controlnet version after this PR merged.
@patrickvonplaten I added reference adain. Please review it too.

@okotaku
Copy link
Contributor Author

okotaku commented May 18, 2023

@wangdong-ivymobile Thank you for your report. I fixed this bug on latest commit.

@jfischoff
Copy link

Just fyi, @lllyasviel fixed a bug recently: Mikubill/sd-webui-controlnet#1309

@kadirnar
Copy link
Contributor

kadirnar commented May 18, 2023

@kadirnar I will add controlnet version after this PR merged.
@patrickvonplaten I added reference adain. Please review it too.

This is great 💯 Can we add inpaint feature like in this repo?
image
https://github.com/Mikubill/sd-webui-controlnet

@okotaku
Copy link
Contributor Author

okotaku commented May 18, 2023

@jfischoff Thank you for your suggestion. I fixed style fidelity rule on latest commit, it is based on Mikubill/sd-webui-controlnet#1309 .

@jfischoff
Copy link

@okotaku I really appreciate the work you are doing!

@wangdong-ivymobile
Copy link

I tried to use this with multiple images (with slight modification), the result is very bad. I wonder if there are some tricks to make it better?
What I did is basically instead of using a latent of a single image, I concat all the ref image latent together.
What I wish to achieve is maybe to have different ref images contributing differently to the result image, e.g. the first setting up the tone and theme, the second add some style and details, etc.

@wangdong-ivymobile
Copy link

Also I tried to use multi controlnet with reference on Webui, the result is also not good but different, I wonder maybe the mechanism is different?

@jfischoff
Copy link

jfischoff commented May 19, 2023

I'm getting slightly different results with this pipeline vs the automatic1111.
This pipeline
result
Automatic1111
00003-1-A colorful Rococo painting of two female Tom Cruises dancing, with white dresses  Painting by Jean-Antoine Watteau

Looking at the code I notice the implementation is different between the original and this PR. I'm not sure it is responsible for the differences, but I was curious for instance why some of the changes were made.

For instance the original code using the style blend to control adding sequences to the bank

                if outer.attention_auto_machine == AutoMachine.Write:
                    if outer.attention_auto_machine_weight > self.attn_weight:
                        self.bank.append(self_attention_context.detach().clone())
                        self.style_cfgs.append(outer.current_style_fidelity)
                if outer.attention_auto_machine == AutoMachine.Read:
                    if len(self.bank) > 0:
                        style_cfg = sum(self.style_cfgs) / float(len(self.style_cfgs))
                        self_attn1_uc = self.attn1(x_norm1, context=torch.cat([self_attention_context] + self.bank, dim=1))
                        self_attn1_c = self_attn1_uc.clone()
                        if len(outer.current_uc_indices) > 0 and style_cfg > 1e-5:
                            self_attn1_c[outer.current_uc_indices] = self.attn1(
                                x_norm1[outer.current_uc_indices],
                                context=self_attention_context[outer.current_uc_indices])
                        self_attn1 = style_cfg * self_attn1_c + (1.0 - style_cfg) * self_attn1_uc
                    self.bank = []
                    self.style_cfgs = []
                if self_attn1 is None:
                    self_attn1 = self.attn1(x_norm1, **context=self_attention_context)**

Will this PR does the check during the read mode:

                if MODE == "write":
                    self.bank.append(norm_hidden_states.detach().clone())
                    attn_output = self.attn1(
                        norm_hidden_states,
                        encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
                        attention_mask=attention_mask,
                        **cross_attention_kwargs,
                    )
                if MODE == "read":
                    if attention_auto_machine_weight > self.attn_weight:
                        attn_output_uc = self.attn1(
                            norm_hidden_states,
                            encoder_hidden_states=torch.cat([norm_hidden_states] + self.bank, dim=1),
                            # attention_mask=attention_mask,
                            **cross_attention_kwargs,
                        )
                        attn_output_c = attn_output_uc.clone()
                        if do_classifier_free_guidance and style_fidelity > 0:
                            attn_output_c[uc_mask] = self.attn1(
                                norm_hidden_states[uc_mask],
                                encoder_hidden_states=norm_hidden_states[uc_mask],
                                **cross_attention_kwargs,
                            )
                        attn_output = style_fidelity * attn_output_c + (1.0 - style_fidelity) * attn_output_uc
                        self.bank.clear()
                    else:
                        attn_output = self.attn1(
                            norm_hidden_states,
                            encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
                            attention_mask=attention_mask,
                            **cross_attention_kwargs,
                        **)

I haven't looked closely enough to see if there are other differences, nor do I know if this is why my images are different. I'm just curious mostly.

latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)

# ref only part
noise = torch.randn_like(ref_image_latents)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be

noise = randn_tensor(ref_image_latents.shape, generator=generator, device=ref_image_latents.device, dtype=ref_image_latents.dtype)

from the utils import to ensure the generation is deterministic.

@jfischoff
Copy link

@okotaku that makes sense. Thanks for answer my question.

@patrickvonplaten
Copy link
Contributor

Cool let's merge this one!

@patrickvonplaten patrickvonplaten merged commit c4359d6 into huggingface:main May 22, 2023
dg845 pushed a commit to dg845/diffusers that referenced this pull request May 23, 2023
* add reference only control

* add reference only control

* add reference only control

* fix lint

* fix lint

* reference adain

* bugfix EulerAncestralDiscreteScheduler

* fix style fidelity rule

* fix default output size

* del unused line

* fix deterministic
@learningyan
Copy link

hi. TypeError: Transformer2DModel.forward() got an unexpected keyword argument 'attention_mask'
also got an unexpected keyword argument 'encoder_attention_mask'
How to solve it ? @okotaku

@okotaku
Copy link
Contributor Author

okotaku commented May 26, 2023

@learningyan #3508
I fixed the bug in this PR.
Please pull latest main branch.

@learningyan
Copy link

learningyan commented May 26, 2023

@okotaku
i have pulled latest main branch, still have this question in line 501 and 502 of examples/community/stable_diffusion_controlnet_reference.py

when I comment out these codes, the code is right. but the quality of generated images is poor. So I'm not sure if commenting out these codes is correct.

reference image :

input3

inference code:

pipe = StableDiffusionReferencePipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

result_img = pipe(ref_image=input_image,
prompt="1girl, masterpiece, best quality",
negative_prompt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
num_inference_steps=20,
num_images_per_prompt=1,
reference_attn=True,
reference_adain=True).images[0]
result_img.save('reference.png')

result_img:

reference

the results in sd-webui-controlnet (see Mikubill/sd-webui-controlnet#1236)

image

So How can I get the similar result as sd-webui-controlnet?
thanks a lot.

@okotaku
Copy link
Contributor Author

okotaku commented May 26, 2023

@learningyan

i have pulled latest main branch, still have this question in line 501 and 502 of examples/community/stable_diffusion_controlnet_reference.py

You need to update diffusers too.

git pull
pip install .

the quality of generated images is poor.

You can change base model.
For example,

pipe = StableDiffusionReferencePipeline.from_pretrained(
"andite/anything-v4.0",
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

And this result is reference attention only,

result_img = pipe(ref_image=input_image,
prompt="1girl, masterpiece, best quality",
negative_prompt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
num_inference_steps=20,
num_images_per_prompt=1,
reference_attn=True,
reference_adain=False,
style_fidelity=0.5  # you can set style_fidelity=1.0
).images[0]

Mikubill/sd-webui-controlnet#1236 (comment)

When examining the details more closely, it appears that the actual model being used is anythingv3 and a custom model named animevae.pt, etc.

@vijishmadhavan
Copy link

why I am getting kind of white tint on images using controlnet reference only?
image (12)
image (9)

`
pipe = StableDiffusionControlNetReferencePipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
controlnet=controlnet,
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image)
image = np.array(image)
low_threshold = 100
high_threshold = 200
canny = cv2.Canny(image, low_threshold, high_threshold)
canny = canny[:, :, None]
canny = np.concatenate([canny, canny, canny], axis=2)
canny_image = Image.fromarray(canny)
result_img = pipe(ref_image=image1,
prompt=prompt,
negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
image=canny_image,
num_inference_steps=20,
controlnet_conditioning_scale=0.8,
reference_attn=True,
reference_adain=True,
style_fidelity=0.5).images[0]`

@miiiz
Copy link

miiiz commented Jun 6, 2023

i just git from the lastest diffusers and install following the important note but got this error. StableDiffusionReferencePipeline works fine.

AttributeError: 'StableDiffusionControlNetReferencePipeline' object has no attribute '_default_height_width'

@blx0102
Copy link

blx0102 commented Jun 6, 2023

@miiiz you can copy the code of that function from reference file to the controlnet reference file

@okotaku
Copy link
Contributor Author

okotaku commented Jun 9, 2023

@miiiz @blx0102 Thank you for your feedback.
I updated the codes.

#3723

@alexisrolland
Copy link
Contributor

Hi there, this thread is a bit long and confusing. Is there any documentation somewhere that describes the classes StableDiffusionReferencePipeline and StableDiffusionCobtrolNetReferencePipeline?

@gasvn
Copy link

gasvn commented Jun 17, 2023

@okotaku Thanks for your diffusers implementation. Based on your code, I achieve a cross-image region drag based on the reference scheme. 1. Use inpaint controlnet to extract inpainted region feature from another image. 2. Use the segment-anything controlnet to keep reasonable pose.

Code in https://github.com/sail-sg/EditAnything
image

@okotaku
Copy link
Contributor Author

okotaku commented Jun 22, 2023

@gasvn Really cool projects! Thank you for your contribution.

@ethankongee
Copy link

two questions here:

  1. I see StableDiffusionControlNetReferencePipeline and StableDiffusionReferencePipeline. What are the differences?
  2. I'm unable to import StableDiffusionControlNetReferencePipeline directly from the diffusers module. From the doc here, looks like I should use custom_pipeline kwarg?

@patrickvonplaten
Copy link
Contributor

Yes you need to use the custom_pipeline kwarg :-)

@djj0s3
Copy link

djj0s3 commented Jul 31, 2023

two questions here:

  1. I see StableDiffusionControlNetReferencePipeline and StableDiffusionReferencePipeline. What are the differences?
  2. I'm unable to import StableDiffusionControlNetReferencePipeline directly from the diffusers module. From the doc here, looks like I should use custom_pipeline kwarg?

For anyone coming to this later, you'll definitely need custom_pipeline like so:

controlnet_reference_pipe = StableDiffusionPipeline.from_pretrained(
  MODEL_ID,
  cache_dir=MODEL_CACHE_DIR,
  custom_pipeline="stable_diffusion_controlnet_reference",
  controlnet=CONTROLNET_PIPE,
  safety_checker=None,
  torch_dtype=torch.float16
  )

@amrakm
Copy link

amrakm commented Jul 31, 2023

@djj0s3 does that mean you can't use reference-only with other ControlNets using StableDiffusionMultiControlNetPipeline

@djj0s3
Copy link

djj0s3 commented Aug 5, 2023

@amrakm are you trying to use reference-only as a controlnet to pass into another pipe with other controlnets? Interesting idea! One option you can try is to import this class directly and use it as StableDiffusionControlNetReferencePipeline.from_pretrained(...):...Since it is written as a subclass of StableDiffusionControlNetPipeline, you should be able to accomplish what you're trying to do. I'm coincidentally loading it this way vs using custom_pipeline since I had to tweak the class to get a couple of other things I'm doing working with it. I haven't tried what you're doing, but that approach should work for you.

@Akmpfen
Copy link

Akmpfen commented Sep 5, 2023

I see the code has reference_attn and reference_adain, but how to use reference_only?

@dotieuthien
Copy link
Contributor

dotieuthien commented Sep 24, 2023

why I am getting kind of white tint on images using controlnet reference only? image (12) image (9)

` pipe = StableDiffusionControlNetReferencePipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 ).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image) image = np.array(image) low_threshold = 100 high_threshold = 200 canny = cv2.Canny(image, low_threshold, high_threshold) canny = canny[:, :, None] canny = np.concatenate([canny, canny, canny], axis=2) canny_image = Image.fromarray(canny) result_img = pipe(ref_image=image1, prompt=prompt, negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry", image=canny_image, num_inference_steps=20, controlnet_conditioning_scale=0.8, reference_attn=True, reference_adain=True, style_fidelity=0.5).images[0]`

I still face with the same issue. Could you please give me any advices? @okotaku

@chyohoo
Copy link

chyohoo commented Nov 1, 2023

why I am getting kind of white tint on images using controlnet reference only? image (12) image (9)
pipe = StableDiffusionControlNetReferencePipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 ).to('cuda:0') pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image) image = np.array(image) low_threshold = 100 high_threshold = 200 canny = cv2.Canny(image, low_threshold, high_threshold) canny = canny[:, :, None] canny = np.concatenate([canny, canny, canny], axis=2) canny_image = Image.fromarray(canny) result_img = pipe(ref_image=image1, prompt=prompt, negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry", image=canny_image, num_inference_steps=20, controlnet_conditioning_scale=0.8, reference_attn=True, reference_adain=True, style_fidelity=0.5).images[0]

I still face with the same issue. Could you please give me any advices? @okotaku

normalized the reference image into [-1,1] not [0,1]

@patrickvonplaten
Copy link
Contributor

@DN6 there is still lots of interest I think. Wonder if we should circle back to a naive diffusers integration as proposed in #4257

@animebing
Copy link

@okotaku @patrickvonplaten the ref_image in stable_diffusion_controlnet_reference.py is not normalized to [-1, 1], because control_image_processor in StableDiffusionControlNetPipeline does not do this, but this normalization is needed in prepare_ref_latents when using VAE

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* add reference only control

* add reference only control

* add reference only control

* fix lint

* fix lint

* reference adain

* bugfix EulerAncestralDiscreteScheduler

* fix style fidelity rule

* fix default output size

* del unused line

* fix deterministic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.