Skip to content

sys.getrefcount() of a LightningModule #12622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MGheini opened this issue Apr 5, 2022 · 1 comment · Fixed by #12897
Closed

sys.getrefcount() of a LightningModule #12622

MGheini opened this issue Apr 5, 2022 · 1 comment · Fixed by #12897
Assignees
Labels
3rd party Related to a 3rd-party bug Something isn't working lightningmodule pl.LightningModule
Milestone

Comments

@MGheini
Copy link

MGheini commented Apr 5, 2022

🐛 Bug

When creating a LightningModule, sys.getrefcount() returns 3 (vs. returning 2 for a conventional torch Module). In my particular case, for instance, this results in memory issues: even if I explicitly do del <LightningModule>, the memory is not released, and I believe this is the root of the issue.

I had originally discussed this problem on Slack. As discussed, I'm creating an issue for it and pinging @carmocca.

To Reproduce

Input:

import sys
import torch
import pytorch_lightning as pl

a = torch.nn.Module()
b = pl.LightningModule()
print(sys.getrefcount(b))
print(sys.getrefcount(a))

Output:

3
2

Expected behavior

Output:

2
2

Environment

  • PyTorch Lightning Version: 1.5.9
  • PyTorch Version: 1.10.2+cu113
  • Python version: 3.8.12
  • OS: Linux
  • CUDA/cuDNN version: 11.6
  • GPU models and configuration: Quadro RTX 8000
  • How you installed PyTorch: pip

Additional context

Per @carmocca's advice on Slack, a workaround is to do b._load_state_dict_pre_hooks.clear() before deleting the LightningModule. This indeed seems to work. However, I'm not sure about an explanation as to why.

Thanks!

cc @carmocca @justusschock @awaelchli @Borda @ananthsub @ninginthecloud @jjenniferdai @rohitgr7

@MGheini MGheini added the needs triage Waiting to be triaged by maintainers label Apr 5, 2022
@carmocca carmocca added 3rd party Related to a 3rd-party bug Something isn't working lightningmodule pl.LightningModule and removed needs triage Waiting to be triaged by maintainers labels Apr 5, 2022
@carmocca carmocca self-assigned this Apr 5, 2022
@carmocca
Copy link
Contributor

carmocca commented Apr 6, 2022

This is caused by this call in the __init__: https://github.com/PyTorchLightning/pytorch-lightning/blob/de085f30f05512862d970e1675536b4c70199d27/pytorch_lightning/core/lightning.py#L115 concretely, this line where we call a sharded internal method: https://github.com/PyTorchLightning/pytorch-lightning/blob/de085f30f05512862d970e1675536b4c70199d27/pytorch_lightning/core/lightning.py#L2026 that keeps an owning reference to self: https://github.com/pytorch/pytorch/blob/76e9730d0273058297f6adce76e73ac7e3db25f0/torch/nn/modules/module.py#L1406

There are multiple ways to fix it, but I wonder if the proper fix should be done upstream with weakref.proxy(self) to avoid the owning reference.

As a workaround, we can clear the _load_state_dict_pre_hooks dictionary on __del__.

cc @yifuwang as the author of the sharded integration (#8944) and @pritamdamania87 as the author of the relevant piece of code upstream (pytorch/pytorch#62070)

@carmocca carmocca added this to the 1.6.x milestone Apr 6, 2022
@carmocca carmocca assigned krshrimali and unassigned carmocca Apr 12, 2022
@carmocca carmocca moved this to Todo in Frameworks Planning Apr 12, 2022
@carmocca carmocca assigned otaj and unassigned krshrimali Apr 26, 2022
@carmocca carmocca moved this from Todo to In Review in Frameworks Planning Apr 27, 2022
@carmocca carmocca moved this from In Review to Blocked in Frameworks Planning May 3, 2022
Repository owner moved this from Blocked to Done in Frameworks Planning May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party bug Something isn't working lightningmodule pl.LightningModule
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants