-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Update transformer_tutorial.py | Resolving issue #1778 #2402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add description for positional encoding calculation for Transformers
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
# input length (in this case, ``10000``). Dividing this term by ``d_model`` scales | ||
# the values to be within a reasonable range for the exponential function. | ||
# The negative sign in front of the logarithm ensures that the values decrease exponentially. | ||
# The reason for writing ``math.log(10000.0)`` instead of ``4`` in the code is to make it clear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. math.log(10000.0) is 9.2, not 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I removed the redundant description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Nice explanation. Thanks.
# for positional encoding. The purpose of this calculation is to create | ||
# a range of values that decrease exponentially. | ||
# This allows the model to learn to attend to positions based on their relative distances. | ||
# The ``math.log(10000.0)`` term in the exponent represents the maximum effective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct, the maximum input length is max_len
, not 10000. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this value is to make the frequencies of the sine and cosine functions very large. This is important because it helps to ensure that the positional encodings are unique for each position in the sequence. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, I need to update this, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please make the description less lengthy and in a simpler language. Thank you!
@YoushaaMurhij please submit a PR updating the description as suggested. Tag @NicolasHug and @kit1980 to review your update. |
Add description for positional encoding calculation for Transformers
Fixes #1778
Description
Add explanation for calculating positional encoding for Transformers by taking the log first and then applying the exponential function.
Checklist
cc @suraj813