reset train dataloader after OOM #10243

veronicamorfi · 2021-10-29T11:28:01Z

What does this PR do?

Fixes #9625
binsearch scaling in batch size scaling wasn't reseting the dataloader batchsize after an OOM error.
Calling the reset function for the train dataloader fixes that.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

SkafteNicki

Should there be a similar reset here for power scaling:
https://github.com/PyTorchLightning/pytorch-lightning/blob/ad32132c204ac9ee6003586e6ee44e2befa6db79/pytorch_lightning/tuner/batch_size_scaling.py#L161-L162

pytorch_lightning/tuner/batch_size_scaling.py

mogwai · 2022-02-15T12:24:57Z

Is this coming in to the next release?

carmocca · 2022-03-28T17:18:48Z

@mogwai This PR still needs to address the comments and write a test etc.

You can work on a new PR to finish it if you are interested in contributing!

stale · 2022-04-16T01:09:56Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

stale · 2022-04-24T04:45:25Z

This pull request is going to be closed. Please feel free to reopen it create a new from the actual master.

reset train dataloader after OOM

ad32132

veronicamorfi requested review from awaelchli, Borda and SkafteNicki as code owners October 29, 2021 11:28

SkafteNicki reviewed Oct 31, 2021

View reviewed changes

pytorch_lightning/tuner/batch_size_scaling.py Show resolved Hide resolved

tchaton added this to the v1.6.x milestone Nov 1, 2021

awaelchli modified the milestones: v1.6.x, 1.5.x Nov 3, 2021

Update batch_size_scaling.py

19df43d

twsl mentioned this pull request Nov 29, 2021

batch_size selected by auto_scale_batch_size triggers out of memory error #9625

Closed

veronicamorfi requested a review from SkafteNicki January 13, 2022 12:15

Borda modified the milestones: 1.5.x, 1.6 Mar 21, 2022

carmocca added the tuner label Mar 28, 2022

carmocca removed this from the 1.6 milestone Mar 28, 2022

stale bot added the won't fix This will not be worked on label Apr 16, 2022

stale bot closed this Apr 24, 2022

SeanNaren mentioned this pull request Jun 24, 2022

Batch Size Finder: Allow datamodule to decide fate when OOM thrown with new hook #13402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset train dataloader after OOM #10243

reset train dataloader after OOM #10243

veronicamorfi commented Oct 29, 2021 •

edited

Loading

SkafteNicki left a comment

mogwai commented Feb 15, 2022

carmocca commented Mar 28, 2022

stale bot commented Apr 16, 2022

stale bot commented Apr 24, 2022

reset train dataloader after OOM #10243

reset train dataloader after OOM #10243

Conversation

veronicamorfi commented Oct 29, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

SkafteNicki left a comment

Choose a reason for hiding this comment

mogwai commented Feb 15, 2022

carmocca commented Mar 28, 2022

stale bot commented Apr 16, 2022

stale bot commented Apr 24, 2022

veronicamorfi commented Oct 29, 2021 •

edited

Loading