Skip to content

Conversation

@carmocca
Copy link
Contributor

@carmocca carmocca commented Jul 28, 2021

What does this PR do?

Add Loop.stop() to stop a loop and replace trainer.should_stop

Necessary for #8578

Does your PR introduce any breaking changes? If yes, please list them.

TODO

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added feature Is an improvement or enhancement refactor labels Jul 28, 2021
@carmocca carmocca added this to the v1.5 milestone Jul 28, 2021
@carmocca carmocca self-assigned this Jul 28, 2021
@codecov
Copy link

codecov bot commented Jul 28, 2021

Codecov Report

Merging #8604 (9548f73) into master (aadd2a9) will decrease coverage by 49%.
The diff coverage is 21%.

@@           Coverage Diff            @@
##           master   #8604     +/-   ##
========================================
- Coverage      92%     44%    -49%     
========================================
  Files         218     169     -49     
  Lines       14407   13965    -442     
========================================
- Hits        13305    6092   -7213     
- Misses       1102    7873   +6771     

should_stop, reason = self._evalute_stopping_criteria(current)

# stop every ddp process if any world process decides to stop
should_stop = trainer.training_type_plugin.reduce_boolean_decision(should_stop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move accelerator reduction within should_stop from the loops ?

Copy link
Contributor Author

@carmocca carmocca Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, It would clean the callbacks using it.

# loops/base.py
def stop(should_stop: bool = True) -> bool:
    self.trainer.training_type_plugin.reduce_boolean_decision(should_stop)
    self.should_stop = should_stop
    ...
    return should_stop

Thoughts? @justusschock @awaelchli

@tchaton
Copy link
Contributor

tchaton commented Jul 29, 2021

@carmocca really like the design :)

if is_last_batch and is_infinite_dataset:
return True

if self.trainer.should_stop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, we might actually want to get rid of this completely.
it's not well justified why we would want to run validation at the point of stopping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me !

should_stop = trainer.training_type_plugin.reduce_boolean_decision(should_stop)
trainer.should_stop = trainer.should_stop or should_stop
if should_stop:
trainer._active_loop.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this starts to leak the loop internals to the callback. i don't think people should reach into the trainer's internals like this.

returning a signal from the callback hook that the trainer interprets could be an alternative that avoids this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would making _active_loop public or adding def stop() to properties as a utility work for you?

@awaelchli awaelchli modified the milestones: v1.5, v1.6 Nov 1, 2021
@carmocca carmocca mentioned this pull request Feb 14, 2022
8 tasks
@carmocca carmocca removed this from the 1.6 milestone Mar 28, 2022
@carmocca
Copy link
Contributor Author

carmocca commented Sep 28, 2022

Note: this feature would have avoided the need to add a TunerExitException in #11089 to stop the loops. Additionally being public loops API.

I'm not continuing this work for now as the future of the loops is unclear.

@carmocca carmocca closed this Dec 14, 2022
@carmocca carmocca deleted the feat/loop-stop branch December 14, 2022 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Is an improvement or enhancement refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants