Skip to content

Commit bd78f63

Browse files
authored
Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463)
Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
1 parent 3ebd2d1 commit bd78f63

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

Diff for: src/diffusers/models/attention_processor.py

+3
Original file line numberDiff line numberDiff line change
@@ -344,11 +344,14 @@ def get_attention_scores(self, query, key, attention_mask=None):
344344
beta=beta,
345345
alpha=self.scale,
346346
)
347+
del baddbmm_input
347348

348349
if self.upcast_softmax:
349350
attention_scores = attention_scores.float()
350351

351352
attention_probs = attention_scores.softmax(dim=-1)
353+
del attention_scores
354+
352355
attention_probs = attention_probs.to(dtype)
353356

354357
return attention_probs

0 commit comments

Comments
 (0)