-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[WIP] Add Loop.stop()
#8604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Loop.stop()
#8604
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8604 +/- ##
========================================
- Coverage 92% 44% -49%
========================================
Files 218 169 -49
Lines 14407 13965 -442
========================================
- Hits 13305 6092 -7213
- Misses 1102 7873 +6771 |
| should_stop, reason = self._evalute_stopping_criteria(current) | ||
|
|
||
| # stop every ddp process if any world process decides to stop | ||
| should_stop = trainer.training_type_plugin.reduce_boolean_decision(should_stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move accelerator reduction within should_stop from the loops ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, It would clean the callbacks using it.
# loops/base.py
def stop(should_stop: bool = True) -> bool:
self.trainer.training_type_plugin.reduce_boolean_decision(should_stop)
self.should_stop = should_stop
...
return should_stopThoughts? @justusschock @awaelchli
|
@carmocca really like the design :) |
| if is_last_batch and is_infinite_dataset: | ||
| return True | ||
|
|
||
| if self.trainer.should_stop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, we might actually want to get rid of this completely.
it's not well justified why we would want to run validation at the point of stopping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me !
9548f73 to
1f0c08a
Compare
| should_stop = trainer.training_type_plugin.reduce_boolean_decision(should_stop) | ||
| trainer.should_stop = trainer.should_stop or should_stop | ||
| if should_stop: | ||
| trainer._active_loop.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this starts to leak the loop internals to the callback. i don't think people should reach into the trainer's internals like this.
returning a signal from the callback hook that the trainer interprets could be an alternative that avoids this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would making _active_loop public or adding def stop() to properties as a utility work for you?
|
Note: this feature would have avoided the need to add a I'm not continuing this work for now as the future of the loops is unclear. |
What does this PR do?
Add
Loop.stop()to stop a loop and replacetrainer.should_stopNecessary for #8578
Does your PR introduce any breaking changes? If yes, please list them.
TODO
Before submitting
PR review