Skip to content

Commit 3b20fe6

Browse files
j3soonSvetlana Karslioglu
and
Svetlana Karslioglu
authored
Fix log-softmax unused issue (#2420)
Fixes: #800 Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent a58279c commit 3b20fe6

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

Diff for: beginner_source/transformer_tutorial.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,15 @@
3838
# of the word (see the next paragraph for more details). The
3939
# ``nn.TransformerEncoder`` consists of multiple layers of
4040
# `nn.TransformerEncoderLayer <https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html>`__.
41-
# To produce a probability distribution over output words, the output of
42-
# the ``nn.TransformerEncoder`` model is passed through a linear layer.
41+
# Along with the input sequence, a square attention mask is required because the
42+
# self-attention layers in ``nn.TransformerDecoder`` are only allowed to attend
43+
# the earlier positions in the sequence. For the language modeling task, any
44+
# tokens on the future positions should be masked. To produce a probability
45+
# distribution over output words, the output of the ``nn.TransformerEncoder``
46+
# model is passed through a linear layer to output unnormalized logits.
47+
# The log-softmax function isn't applied here due to the later use of
48+
# `CrossEntropyLoss <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html>`__,
49+
# which requires the inputs to be unnormalized logits.
4350
#
4451

4552
import math

0 commit comments

Comments
 (0)