Skip to content

Training will continue... but it does not #5717

@riklopfer

Description

@riklopfer

❓ Questions and Help

Related to #2644

I tried to set min_steps so that the model will continue training after the warmup + patience. Unfortunately, it does not appear to do that.

I see a bunch of log messages like this,

Epoch 17:  11%|█▏        | 4/35 [00:02<00:16,  1.86it/s, Trainer was signaled to stop but required minimum epochs (1) or minimum steps (288) has not been met. Training will continue...sion=0.542, train_recall=0.0718, train_f1=0.127]
INFO:lightning:Trainer was signaled to stop but required minimum epochs (1) or minimum steps (288) has not been met. Training will continue...

I can see this behavior in the CSV logs as well. Warm up happens for the 5 epochs. After that point it runs one step per epoch.

val_loss,val_accuracy,val_precision,val_recall,val_f1,epoch,step
0.5087231397628784,0.0,0.0,0.0,0.0,0,32
0.36191996932029724,0.0,0.0,0.0,0.0,1,65
0.29924529790878296,0.0,0.0,0.0,0.0,2,98
0.2752218246459961,0.0,0.0,0.0,0.0,3,131
0.26732462644577026,0.0,0.0,0.0,0.0,4,164
0.2639540731906891,0.0,0.0,0.0,0.0,5,197
0.263753205537796,0.0,0.0,0.0,0.0,6,198
0.26352185010910034,0.0,0.0,0.0,0.0,7,199
0.2633569538593292,0.0,0.0,0.0,0.0,8,200
0.26324737071990967,0.0,0.0,0.0,0.0,9,201

Code

    args.min_steps = wu_steps + steps_per_epoch * patience
    early_stop_callback = EarlyStopping(
        monitor='val_f1',
        min_delta=0.00,
        patience=5,
        verbose=True,
        mode='max'
    )

What's your environment?

  • OS: Linux
  • Packaging conda + pip
  • Version pytorch-lightning==1.1.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions