Skip to content

Early stopping skips batches when min_epochs not reached #6699

@gunthergl

Description

@gunthergl

🐛 Bug

When EarlyStopping hits but min_epochs is not reached, it seems that only the first batch of an epoch is used.

TL;DR: In a setting of 100 samples, batchsize 25 (=4 batches), training_step() should always be called exactly 4 times. When EarlyStopping hit before min_epochs is reached, it is only called once.

Please reproduce using the BoringModel

To Reproduce

I used the original BoringModel template and changed:

  • num_samples = 100
  • batch_size=25 in train, val and test
  • Add print('train, batch=', batch_idx) to self.training_step()
  • Add EarlyStopping and min_epochs when initializing the trainer

Resulting in:
https://colab.research.google.com/drive/11tlIU9NusGPeXJLUKA52ECuhIXOCH_4k?usp=sharing

I added the relevant output when training as image here:
image

Expected behavior

In my BoringModel, self.training_step() must be called 4 times in each epoch as long as min_epochs is not reached. Otherwise I suspect that the remaining 3 batches were not used to update the model parameters.

Environment

  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.8.0+cu101
    • pytorch-lightning: 1.2.5
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.10
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

  • I took a look into the sourcecode during my actual analysis and found that in pytorch_lightning/trainer/training_loop.py:TrainLoop.run_training_epoch(), self.trainer.should_stop is True (and breaks after the first batch) when EarlyStopping hit.

  • As a result, it seems that once EarlyStopping hit before min_epochs is reached, it always stops with min_epochs, even if the stopping-criteria is not meet then. (But I did not MWE that.)

  • PS: Is it intended that BoringModel.forward() is not called in the template but self.layer() instead?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 1Medium priority task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions