-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix CLI race condition saving the config #11199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11199 +/- ##
========================================
- Coverage 88% 88% -0%
========================================
Files 177 177
Lines 16764 16557 -207
========================================
- Hits 14782 14587 -195
+ Misses 1982 1970 -12 |
for more information, see https://pre-commit.ci
Co-authored-by: Roger Shieh <[email protected]>
pytorch_lightning/utilities/cli.py
Outdated
|
||
if not self.overwrite: | ||
# check if the file exists on rank 0 | ||
file_exists = fs.isfile(config_path) if is_global_zero else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca in ModelCheckpoint we have the same code except that we use fs.exists
. Should we make it consistent?
This special way of checking file existence with broadcast could even become a function, as we may reuse it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but for a follow-up.
Do you think it should be part of checkpoint io?
What does this PR do?
Fixes #11158
Test plan
No test as it's caused by a race condition. Testing it would require adding delays which are slow and flaky.
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
cc @Borda @carmocca @mauvilsa