Skip to content

Trainer.fit() multiple times with max_steps #11425

@amin-nejad

Description

@amin-nejad

🐛 Bug

I have just come across what I consider to be a bug whereby the trainer will continue training past max_epochs if fit is called multiple times but not with max_steps. E.g. if max_epochs is specified as 2, each fit call will train another 2 epochs. But with max_steps only the first fit call will do any training.

To Reproduce

Reproduced on Colab using the Boring Model. Simply call the Trainer.fit method multiple times and observe that training happens on subsequent calls when max_epochs is specified but not when max_steps is specified

Expected behavior

I think whatever behaviour is decided as correct should be consistent whether the number of iterations has been specified in terms of epochs or steps. I personally think that multiple fit calls (which actually result in training) should be supported (related: #9636) so I think the behaviour for max_steps should be changed such that it trains another max_steps number of steps every fit call.

Environment

  • CUDA:
    - GPU:
    - available: False
    - version: None
  • Packages:
    - numpy: 1.22.0
    - pyTorch_debug: False
    - pyTorch_version: 1.10.1
    - pytorch-lightning: 1.5.8
    - tqdm: 4.62.3
  • System:
    - OS: Darwin
    - architecture:
    - 64bit
    - processor: i386
    - python: 3.8.12
    - version: Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:20 PDT 2021; root:xnu-7195.141.6~3/RELEASE_ARM64_T8101

Additional context

Also related to #7629 and #11426

cc @tchaton @rohitgr7 @carmocca @justusschock @ananthsub @ninginthecloud

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingloopsRelated to the Loop APIpriority: 0High priority taskpriority: 1Medium priority task

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions