-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Remove is_global_zero check in training_epoch_loop
#12134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This might break third-party loggers whose integration is not part of the PL codebase though and who didn't decorate the save function but still implement it. Also: is it documented somewhere, that this is required? |
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! But I still see value in @justusschock comment. This would have to be explicit in the documentation.
awaelchli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with me as an intermediate step for #11676 (would address the comment of @justusschock and shift the responsibility to the log method directly)
|
@justusschock I agree, this is a breaking change. @daniellepintz could you update the changelog to document the behavior change here? The motivation is the same as #8589 . In this case, users may want to have a logger per process to publish metrics. This can be very useful in distributed training for detecting outliers. Currently, the loop is hardcoded to only flush the metrics on rank 0. In v1.7 this behavior will change given |
|
I updated the changelog! |
What does this PR do?
This check in
training_epoch_loopis unnecessary so we can remove it.This is because
saveon theLightningLoggerBaseAPI is not implemented:https://github.com/PyTorchLightning/pytorch-lightning/blob/b29b07e9788311326bca4779d70e89eb36bfc57f/pytorch_lightning/loggers/base.py#L173-L174
And
saveis only overridden for CSV and TB loggers, and both implementations are decorated with @rank_zero_only:https://github.com/PyTorchLightning/pytorch-lightning/blob/b29b07e9788311326bca4779d70e89eb36bfc57f/pytorch_lightning/loggers/csv_logs.py#L204-L207
https://github.com/PyTorchLightning/pytorch-lightning/blob/b29b07e9788311326bca4779d70e89eb36bfc57f/pytorch_lightning/loggers/tensorboard.py#L254-L264
Fixes #12127
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃