-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
When EarlyStopping hits but min_epochs is not reached, it seems that only the first batch of an epoch is used.
TL;DR: In a setting of 100 samples, batchsize 25 (=4 batches), training_step() should always be called exactly 4 times. When EarlyStopping hit before min_epochs is reached, it is only called once.
Please reproduce using the BoringModel
To Reproduce
I used the original BoringModel template and changed:
num_samples = 100batch_size=25in train, val and test- Add
print('train, batch=', batch_idx)toself.training_step() - Add EarlyStopping and min_epochs when initializing the trainer
Resulting in:
https://colab.research.google.com/drive/11tlIU9NusGPeXJLUKA52ECuhIXOCH_4k?usp=sharing
I added the relevant output when training as image here:

Expected behavior
In my BoringModel, self.training_step() must be called 4 times in each epoch as long as min_epochs is not reached. Otherwise I suspect that the remaining 3 batches were not used to update the model parameters.
Environment
- CUDA:
- GPU:
- Tesla T4
- available: True
- version: 10.1
- GPU:
- Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.8.0+cu101
- pytorch-lightning: 1.2.5
- tqdm: 4.41.1
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020
Additional context
-
I took a look into the sourcecode during my actual analysis and found that in
pytorch_lightning/trainer/training_loop.py:TrainLoop.run_training_epoch(),self.trainer.should_stopis True (and breaks after the first batch) when EarlyStopping hit. -
As a result, it seems that once EarlyStopping hit before min_epochs is reached, it always stops with min_epochs, even if the stopping-criteria is not meet then. (But I did not MWE that.)
-
PS: Is it intended that
BoringModel.forward()is not called in the template butself.layer()instead?