File tree 1 file changed +0
-9
lines changed
1 file changed +0
-9
lines changed Original file line number Diff line number Diff line change @@ -103,15 +103,6 @@ def generate_square_subsequent_mask(sz: int) -> Tensor:
103
103
# positional encodings have the same dimension as the embeddings so that
104
104
# the two can be summed. Here, we use ``sine`` and ``cosine`` functions of
105
105
# different frequencies.
106
- # The ``div_term`` in the code is calculated as
107
- # ``torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))``.
108
- # This calculation is based on the original Transformer paper’s formulation
109
- # for positional encoding. The purpose of this calculation is to create
110
- # a range of values that decrease exponentially.
111
- # This allows the model to learn to attend to positions based on their relative distances.
112
- # The ``math.log(10000.0)`` term in the exponent represents the maximum effective
113
- # input length (in this case, ``10000``). Dividing this term by ``d_model`` scales
114
- # the values to be within a reasonable range for the exponential function.
115
106
#
116
107
117
108
class PositionalEncoding (nn .Module ):
You can’t perform that action at this time.
0 commit comments