Purpose of grid reshaping in 2D sinusoidal positional embeddings #11203
Unanswered
jinhong-ni
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I'm a bit confused about this line of code (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L282).
Specifically,
grid_size
is the tuple consisting of the heightH
and widthW
of the image.grid
computed in L280 should have the shape2*H*W
, and L282 reshapes it into2*1*W*H
. The dimensionsW*H
will be later flattened to match the dimensions of the latent.However, if you continue to
PatchEmbed
(https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py#L549), you will notice that the latent with shapeBCHW
is flattened intoB(H*W)C
, this flattening operation does not seem to match withgrid
in L282. Does this reordering mess up with the ordering of dimensions when being flattened in caseH
andW
are not equal?I appreciate all responses and assistance from you in advance.
Beta Was this translation helpful? Give feedback.
All reactions