-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix for multiple callbacks #6197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6197 +/- ##
======================================
- Coverage 93% 93% -0%
======================================
Files 159 159
Lines 11378 11375 -3
======================================
- Hits 10623 10591 -32
- Misses 755 784 +29 |
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great fix !
| def on_train_end(self) -> None: | ||
| assert self.trainer.current_epoch == self.expected_end_epoch, 'Early Stopping Failed' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop this and rather check in the test trainer epoch is as expected, so there is not random inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I originally did but because of how DDP Spawn works, the local trainer's current epoch doesn't seem to be kept in sync which is fair (since it's only kept in sync during trainer on the processes). This is why I had to move it to on_train_end because this happens within the spawn process!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so can we have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but I'd need to separate out the tests, I don't think its really worth it because it would be a lot of duplication
* Fix for multiple callbacks * Add CHANGELOG.md * Remove old params * Skip tests on windows using ddp * Change name of the variable to not clash with should stop, which is separate * Apply suggestions from code review * Fix params Co-authored-by: Jirka Borovec <[email protected]>
* Fix for multiple callbacks * Add CHANGELOG.md * Remove old params * Skip tests on windows using ddp * Change name of the variable to not clash with should stop, which is separate * Apply suggestions from code review * Fix params Co-authored-by: Jirka Borovec <[email protected]>
* Fix for multiple callbacks * Add CHANGELOG.md * Remove old params * Skip tests on windows using ddp * Change name of the variable to not clash with should stop, which is separate * Apply suggestions from code review * Fix params Co-authored-by: Jirka Borovec <[email protected]>
What does this PR do?
Fixes #6194
We recently modified the behaviour of the early stopping callback in the accelerator refactor, this led to the bug mentioned above. This was due to defaulting to False, when other callbacks could've updated this value to True.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃