Skip to content

Commit 83cbc8d

Browse files
Youshaa Murhijcarljparker
Youshaa Murhij
andauthored
Update transformer_tutorial.py | Resolving issue #1778 (#2402)
* Update transformer_tutorial.py Add description for positional encoding calculation for Transformers * Update Positional Encoding description in transformer_tutorial.py * Update transformer_tutorial.py --------- Co-authored-by: Carl Parker <[email protected]>
1 parent 769cff9 commit 83cbc8d

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

Diff for: beginner_source/transformer_tutorial.py

+9
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,15 @@ def generate_square_subsequent_mask(sz: int) -> Tensor:
103103
# positional encodings have the same dimension as the embeddings so that
104104
# the two can be summed. Here, we use ``sine`` and ``cosine`` functions of
105105
# different frequencies.
106+
# The ``div_term`` in the code is calculated as
107+
# ``torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))``.
108+
# This calculation is based on the original Transformer paper’s formulation
109+
# for positional encoding. The purpose of this calculation is to create
110+
# a range of values that decrease exponentially.
111+
# This allows the model to learn to attend to positions based on their relative distances.
112+
# The ``math.log(10000.0)`` term in the exponent represents the maximum effective
113+
# input length (in this case, ``10000``). Dividing this term by ``d_model`` scales
114+
# the values to be within a reasonable range for the exponential function.
106115
#
107116

108117
class PositionalEncoding(nn.Module):

0 commit comments

Comments
 (0)