You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gradient norm at training_step_end is super small! My interpretation is that the gradient clipping happens before AMP's unscale so that unscale makes the gradient much smaller than the clip norm.
🐛 Bug
Gradient clip norm is called before AMP's unscale leading to wrong gradients.
To Reproduce
This happens with:
The gradient norm at
training_step_end
is super small! My interpretation is that the gradient clipping happens before AMP's unscale so that unscale makes the gradient much smaller than the clip norm.https://colab.research.google.com/drive/1CXMo2JP_JmwG_YNrTwTG5S0pAsr41pGC?usp=sharing
Expected behavior
Gradient norms should be close to 1 (clip value). This is verifiable by setting
precision=32
and rerun.Environment
The text was updated successfully, but these errors were encountered: