Skip to content

Implement .swap() against diffusers 0.12 #2385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

damian0815
Copy link
Contributor

@damian0815 damian0815 commented Jan 21, 2023

Re-implementation of .swap() for diffusers 0.12's new CrossAttnProcessor API.

needs diffusers 0.12 pip install https://github.com/huggingface/diffusers

currently only tested/working on mac CPU (invoke.py --always_use_cpu).

todo:

  • Sliced version
  • Test on linux & windows
  • Correctly reinstate the proper CrossAttnProcessor after it quits - it should automatically go back to eg xformers if that's what was there before doing a .swap()
  • remove errors about missing monkeypatching

@damian0815 damian0815 force-pushed the diffusers_cross_attention_control_reimplementation branch from c2183b6 to 313b206 Compare January 22, 2023 17:13
@damian0815
Copy link
Contributor Author

MPS .swap is non-functional until kulinseth/pytorch#222 is merged

@damian0815
Copy link
Contributor Author

Ok I think this is good. Can i get some testing support on Windows and Linux please?

  • does it work with/without xformers
  • if you're running xformers: does InvokeAI correctly return to using xformers after doing a .swap()

@damian0815 damian0815 marked this pull request as ready for review January 25, 2023 22:06
@lstein
Copy link
Collaborator

lstein commented Jan 26, 2023

It seems to be working on Linux. How much change is expected in the non-swapped portions of the image? Here's a typical test:
mother and daughter having lunch
image
mother and daughter.swap(son) having lunch
image
mother and daughter.swap(son) having lunch.swap(dinner)
image

Memory usage was quite good. No apparent increase in memory usage when swap activated. Works with xformers as well.

@keturn
Copy link
Contributor

keturn commented Jan 28, 2023

mother and daughter.swap(son) having lunch.swap(dinner)

Oh, do we get to use more than one operation now? The previous implementation was limited to one, I thought.

@lstein
Copy link
Collaborator

lstein commented Jan 28, 2023 via email

@keturn
Copy link
Contributor

keturn commented Jan 29, 2023

Testing on Linux, results seem poor. The image with swap is almost (though not quite) identical to the replacement on its own.

Starting with --no-xformers does not seem to improve matters.

a photo of the trunk of a car filled with soccer gear

soccer gear

a photo of the trunk of a car filled with (picnic supplies).swap(soccer gear)

picnic supplies SWAP soccer gear

a photo of the trunk of a car filled with picnic supplies

picnic supplies

using SD 1.5, DDIM, 25 steps

@damian0815
Copy link
Contributor Author

@keturn can you try with , t_start=0.1 or 0.2? ie a photo of the trunk of a car filled with (picnic supplies).swap(soccer gear, t_start=0.1)

@JPPhoto
Copy link
Contributor

JPPhoto commented Jan 30, 2023

@damian0815 Perhaps t_start should default to something like 0.2 so there's a visual difference?

@keturn
Copy link
Contributor

keturn commented Jan 30, 2023

Interesting. Setting t_start, even as low as 0.05 (of 25 steps) is enough to retain the shape of the "picnic" car instead of looking totally like the replacement prompt.

(picnic supplies).swap(soccer gear, t_start=0.05)

t_start=0.05

I guess this will probably all be more comprehensible once we get the attention map visualizations back, huh?

@keturn
Copy link
Contributor

keturn commented Jan 30, 2023

Same goes for the test prompt I got from hipsterusername a while back:

silhouette of a dancing (elvis).swap(frog)

With default settings [t_start=0], it is practically indistinguishable from the replacement prompt silhouette of a dancing frog. Effective values of t_start seem to be low-but-non-zero. Which translates to a couple of steps, I assume.

@keturn
Copy link
Contributor

keturn commented Jan 30, 2023

Oh, do we get to use more than one operation now? The previous implementation was limited to one, I thought.

This warning still pops up in the log: "warning: cross-attention control options are not working properly for >1 edit"

but using multiple swaps definitely does do stuff. Is it warning us that you can have multiple edits but not have independent values of t_start/t_end for them?

@damian0815
Copy link
Contributor Author

damian0815 commented Jan 30, 2023

yep, t_start should default to something >=1 step. probably 1 step would be fine. i wonder if t_start=0 should simply mean "after 1 step" then, since there's no longer an s_start.

@keturn

but using multiple swaps definitely does do stuff. Is it warning us that you can have multiple edits but not have independent values of t_start/t_end for them?

yes, that's the warning. i do want to eventually address that - should be more clear how now that i've broken off compel as a separate lib (which i'd like to convert to an import instead of having local source)

…github.com:damian0815/InvokeAI into diffusers_cross_attention_control_reimplementation
@damian0815
Copy link
Contributor Author

ok no skipping the first step is a bad idea, i'll just make the default 0.1

@damian0815
Copy link
Contributor Author

damian0815 commented Jan 30, 2023

better now:

a photo of the trunk of a car filled with picnic supplies Screen Shot 2023-01-30 at 15 32 46
a photo of the trunk of a car filled with soccer gear Screen Shot 2023-01-30 at 15 32 57
a photo of the trunk of a car filled with (picnic supplies).swap(soccer gear) Screen Shot 2023-01-30 at 15 39 09

@damian0815
Copy link
Contributor Author

i took the liberty of ticking the test of linux & windows button, think we should be good to go on this

@damian0815
Copy link
Contributor Author

@keturn i also took the liberty of "resolving" the concerns you raised re: the naming of the remove_cross_attention_control function (which is now called restore_default_cross_attention)

Copy link
Contributor

@keturn keturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now experimented with some high-RAM operations before and after running the swap to confirm that it does indeed put the memory-efficient attention settings back correctly, and that's working well for me both with xformers and without. ✔️

There are still minor details I'm unclear on (like why you can pass None to restore_default_cross_attention), but overall this is a huge improvement to the stability of the cross-attention code with diffusers 0.12 and I think it's good to merge. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants