-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Optimize non-empty directory warning check in model checkpoint callback #9615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewers:
This PR is an optimization when using remote filesystems with lightning. this warning is currently called on every rank, which creates duplicate, bursty outbound network calls. in reality, we only needed to check this on global rank zero (assuming we have a common dirpath across nodes).
when training with larger configurations (e.g. 16 nodes x 8 gpus), we ran into quota limitations for our jobs. this will benefit anyone saving checkpoints using remote filesystems such as S3/GCS/azure storage
for more information, see https://pre-commit.ci
__resolve_ckpt_dir and global_zero only
ananthsub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: ananthsub <[email protected]>
for more information, see https://pre-commit.ci
__resolve_ckpt_dir and global_zero only
Codecov Report
@@ Coverage Diff @@
## master #9615 +/- ##
=======================================
- Coverage 93% 89% -4%
=======================================
Files 180 179 -1
Lines 15116 15308 +192
=======================================
- Hits 14043 13587 -456
- Misses 1073 1721 +648 |
What does this PR do?
Along the lines of #9074, consolidate more fs calls from checkpoint callback. For non-empty directory warning:
__resolve_ckpt_dirDoes your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃