Skip to content

Update transformer_tutorial.py | Resolving issue #1778 #2402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 2, 2023
Merged

Update transformer_tutorial.py | Resolving issue #1778 #2402

merged 4 commits into from
Jun 2, 2023

Conversation

YoushaaMurhij
Copy link
Contributor

@YoushaaMurhij YoushaaMurhij commented Jun 1, 2023

Add description for positional encoding calculation for Transformers

Fixes #1778

Description

Add explanation for calculating positional encoding for Transformers by taking the log first and then applying the exponential function.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes [Help Wanted] Why take the log function and then apply exp? #1778")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @suraj813

 Add description for positional encoding calculation for Transformers
@YoushaaMurhij YoushaaMurhij changed the title Update transformer_tutorial.py Resolving issue #1778 Update transformer_tutorial.py | Resolving issue #1778 Jun 1, 2023
@netlify
Copy link

netlify bot commented Jun 1, 2023

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit ed8c29d
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/647a246d3aa7870008689752
😎 Deploy Preview https://deploy-preview-2402--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added question intro docathon-h1-2023 A label for the docathon in H1 2023 easy and removed cla signed labels Jun 1, 2023
# input length (in this case, ``10000``). Dividing this term by ``d_model`` scales
# the values to be within a reasonable range for the exponential function.
# The negative sign in front of the logarithm ensures that the values decrease exponentially.
# The reason for writing ``math.log(10000.0)`` instead of ``4`` in the code is to make it clear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. math.log(10000.0) is 9.2, not 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I removed the redundant description.

Copy link
Contributor

@carljparker carljparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Nice explanation. Thanks.

@carljparker carljparker merged commit 83cbc8d into pytorch:main Jun 2, 2023
# for positional encoding. The purpose of this calculation is to create
# a range of values that decrease exponentially.
# This allows the model to learn to attend to positions based on their relative distances.
# The ``math.log(10000.0)`` term in the exponent represents the maximum effective
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct, the maximum input length is max_len, not 10000. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this value is to make the frequencies of the sine and cosine functions very large. This is important because it helps to ensure that the positional encodings are unique for each position in the sequence. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I need to update this, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you please make the description less lengthy and in a simpler language. Thank you!

@svekars
Copy link
Contributor

svekars commented Jun 2, 2023

@YoushaaMurhij please submit a PR updating the description as suggested. Tag @NicolasHug and @kit1980 to review your update.

svekars pushed a commit that referenced this pull request Jun 2, 2023
@svekars svekars removed question intro docathon-h1-2023 A label for the docathon in H1 2023 easy labels Jun 2, 2023
svekars pushed a commit that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Help Wanted] Why take the log function and then apply exp?
5 participants