-
Notifications
You must be signed in to change notification settings - Fork 3.5k
reset train dataloader after OOM #10243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a similar reset here for power scaling:
https://github.com/PyTorchLightning/pytorch-lightning/blob/ad32132c204ac9ee6003586e6ee44e2befa6db79/pytorch_lightning/tuner/batch_size_scaling.py#L161-L162
Is this coming in to the next release? |
@mogwai This PR still needs to address the comments and write a test etc. You can work on a new PR to finish it if you are interested in contributing! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
What does this PR do?
Fixes #9625
binsearch scaling in batch size scaling wasn't reseting the dataloader batchsize after an OOM error.
Calling the reset function for the train dataloader fixes that.
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃