Gradient clip norm is called before AMP's unscale leading to wrong gradients #9330

phizaz · 2021-09-05T08:21:59Z

🐛 Bug

Gradient clip norm is called before AMP's unscale leading to wrong gradients.

To Reproduce

This happens with:

trainer = pl.Trainer(
        gpus=1,
        precision=16,
        gradient_clip_val=1,
)

The gradient norm at training_step_end is super small! My interpretation is that the gradient clipping happens before AMP's unscale so that unscale makes the gradient much smaller than the clip norm.

https://colab.research.google.com/drive/1CXMo2JP_JmwG_YNrTwTG5S0pAsr41pGC?usp=sharing

Expected behavior

Gradient norms should be close to 1 (clip value). This is verifiable by setting precision=32 and rerun.

Environment

* CUDA:
	- GPU:
		- Tesla K80
	- available:         True
	- version:           10.2
* Packages:
	- numpy:             1.19.5
	- pyTorch_debug:     False
	- pyTorch_version:   1.9.0+cu102
	- pytorch-lightning: 1.4.5
	- tqdm:              4.62.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.7.11
	- version:           #1 SMP Sat Jun 5 09:50:34 PDT 2021

The text was updated successfully, but these errors were encountered:

cowwoc · 2021-10-03T23:27:42Z

I believe this is fixed in release 1.4.9: https://github.com/PyTorchLightning/pytorch-lightning/releases/tag/1.4.9

Can this issue be closed?

carmocca · 2021-10-06T10:47:34Z

Keeping it alive as the fix is not in master

bergen · 2021-10-12T18:27:36Z

Has this issue been fixed for TPUs? I am getting a discrepancy between single core and multi-core TPU training. Using version 1.4.9.

phizaz added bug Something isn't working help wanted Open to be worked on labels Sep 5, 2021

tchaton added the priority: 0 High priority task label Sep 6, 2021

carmocca self-assigned this Sep 6, 2021

carmocca mentioned this issue Sep 6, 2021

Fix gradient norm tracking and gradient clipping #9287

Merged

11 tasks

justusschock mentioned this issue Sep 20, 2021

[Bugfix] Fix location of unscale in mixed precision plugin #9606

Merged

12 tasks

awaelchli mentioned this issue Sep 22, 2021

DDP with 2 GPUs doesn't give same results as 1 GPU with the same effective batch size #6789

Open

tchaton added this to the v1.5 milestone Oct 25, 2021

carmocca closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient clip norm is called before AMP's unscale leading to wrong gradients #9330

Gradient clip norm is called before AMP's unscale leading to wrong gradients #9330

phizaz commented Sep 5, 2021

cowwoc commented Oct 3, 2021

Uh oh!

carmocca commented Oct 6, 2021

Uh oh!

bergen commented Oct 12, 2021

Uh oh!

Gradient clip norm is called before AMP's unscale leading to wrong gradients #9330

Gradient clip norm is called before AMP's unscale leading to wrong gradients #9330

Comments

phizaz commented Sep 5, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

cowwoc commented Oct 3, 2021

Uh oh!

carmocca commented Oct 6, 2021

Uh oh!

bergen commented Oct 12, 2021

Uh oh!