Skip to content

Commit ebd946f

Browse files
author
matt3o
committed
add multiple attention masks
1 parent d6cfa11 commit ebd946f

File tree

2 files changed

+40
-4
lines changed

2 files changed

+40
-4
lines changed

IPAdapterPlus.py

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,7 @@ def set_new_condition(self, weight, ipadapter, cond, uncond, dtype, number, weig
241241
def __call__(self, n, context_attn2, value_attn2, extra_options):
242242
org_dtype = n.dtype
243243
cond_or_uncond = extra_options["cond_or_uncond"]
244+
244245
with torch.autocast(device_type=self.device, dtype=self.dtype):
245246
q = n
246247
k = context_attn2
@@ -287,9 +288,40 @@ def __call__(self, n, context_attn2, value_attn2, extra_options):
287288

288289
if mask_h*mask_w == qs:
289290
break
291+
292+
# check if using AnimateDiff and sliding context window
293+
if (mask.shape[0] > 1 and hasattr(cond_or_uncond, "params") and cond_or_uncond.params["sub_idxs"] is not None):
294+
# if mask length matches or exceeds full_length, just get sub_idx masks, resize, and continue
295+
if mask.shape[0] >= cond_or_uncond.params["full_length"]:
296+
mask_downsample = torch.Tensor(mask[cond_or_uncond.params["sub_idxs"]])
297+
mask_downsample = F.interpolate(mask_downsample.unsqueeze(1), size=(mask_h, mask_w), mode="bicubic").squeeze(1)
298+
# otherwise, need to do more to get proper sub_idxs masks
299+
else:
300+
# first, resize to needed attention size (to save on needed memory for other operations)
301+
mask_downsample = F.interpolate(mask.unsqueeze(1), size=(mask_h, mask_w), mode="bicubic").squeeze(1)
302+
# check if mask length matches full_length - if not, make it match
303+
if mask_downsample.shape[0] < cond_or_uncond.params["full_length"]:
304+
mask_downsample = torch.cat((mask_downsample, mask_downsample[-1:].repeat((cond_or_uncond.params["full_length"]-mask_downsample.shape[0], 1, 1))), dim=0)
305+
# if we have too many remove the excess (should not happen, but just in case)
306+
if mask_downsample.shape[0] > cond_or_uncond.params["full_length"]:
307+
mask_downsample = mask_downsample[:cond_or_uncond.params["full_length"]]
308+
# now, select sub_idxs masks
309+
mask_downsample = mask_downsample[cond_or_uncond.params["sub_idxs"]]
310+
# otherwise, perform usual mask interpolation
311+
else:
312+
mask_downsample = F.interpolate(mask.unsqueeze(1), size=(mask_h, mask_w), mode="bicubic").squeeze(1)
313+
314+
# if we don't have enough masks repeat the last one until we reach the right size
315+
if mask_downsample.shape[0] < batch_prompt:
316+
mask_downsample = torch.cat((mask_downsample, mask_downsample[-1:, :, :].repeat((batch_prompt-mask_downsample.shape[0], 1, 1))), dim=0)
317+
# if we have too many remove the exceeding
318+
elif mask_downsample.shape[0] > batch_prompt:
319+
mask_downsample = mask_downsample[:batch_prompt, :, :]
290320

291-
mask_downsample = F.interpolate(mask.unsqueeze(0).unsqueeze(0), size=(mask_h, mask_w), mode="bilinear").squeeze(0)
292-
mask_downsample = mask_downsample.view(1, -1, 1).repeat(out.shape[0], 1, out.shape[2])
321+
# repeat the masks
322+
mask_downsample = mask_downsample.repeat(len(cond_or_uncond), 1, 1)
323+
mask_downsample = mask_downsample.view(mask_downsample.shape[0], -1, 1).repeat(1, 1, out.shape[2])
324+
293325
out_ip = out_ip * mask_downsample
294326

295327
out = out + out_ip
@@ -410,7 +442,7 @@ def apply_ipadapter(self, ipadapter, model, weight, clip_vision=None, image=None
410442
work_model = model.clone()
411443

412444
if attn_mask is not None:
413-
attn_mask = attn_mask.squeeze().to(self.device)
445+
attn_mask = attn_mask.to(self.device)
414446

415447
patch_kwargs = {
416448
"number": 0,

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ IPAdapter implementation that follows the ComfyUI way of doing things. The code
55

66
## Important updates
77

8+
**2023/11/24**: Support for multiple attention masks.
9+
810
**2023/11/23**: Small but important update: the new default location for the IPAdapter models is `ComfyUI/models/ipadapter`. **No panic**: the legacy `ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus/models` location still works and nothing will break.
911

1012
**2023/11/08**: Added [attention masking](#attention-masking).
@@ -119,7 +121,7 @@ IPAdapter offers an interesting model for a kind of "face swap" effect. [The wor
119121

120122
**Note:** there's a new `full-face` model available that's arguably better.
121123

122-
### Masking
124+
### Masking (Inpainting)
123125

124126
The most effective way to apply the IPAdapter to a region is by an [inpainting workflow](./examples/IPAdapter_inpaint.json). Remeber to use a specific checkpoint for inpainting otherwise it won't work. Even if you are inpainting a face I find that the *IPAdapter-Plus* (not the *face* one), works best.
125127

@@ -167,6 +169,8 @@ In the picture below I use two reference images masked one on the left and the o
167169

168170
<img src="./examples/masking.jpg" width="512" alt="masking" />
169171

172+
It is also possible to send a batch of masks that will be applied to a batch of latents, one per frame. The size should be the same but if needed some normalization will be performed to avoid errors. This feature also supports (experimentally) AnimateDiff including context sliding.
173+
170174
In the examples directory you'll find a couple of masking workflows: [simple](examples/IPAdapter_mask.json) and [two masks](examples/IPAdapter_2_masks.json).
171175

172176
## Troubleshooting

0 commit comments

Comments
 (0)