Skip to content

RuntimeError: Early stopping conditioned on metric val_loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following:  #11534

@Drow999

Description

@Drow999

🐛 Bug

Hello guys,
Does anybody have any idea why the early stopping was triggered? I checked the value of val_loss and it is not equal to 0. And I found the issue #490 #492 solved a similar problem, but that is pl 0.5, and the verison I used is 1.4.5, also I tried 1.5.8 before, the result is the same.

To Reproduce

    def validation_step(self, batch, batch_idx):
        print('batch size:', len(batch['pose_body']))
        drec = self(batch['pose_body'].view(-1, 36))

        loss = self._compute_loss(batch, drec)
        print(loss)
        val_loss = loss['unweighted_loss']['loss_total']
        print('val_loss', val_loss)
        #if self.renderer is not None and self.global_rank == 0 and batch_idx % 500==0 and np.random.rand()>0.5:
        #    out_fname = makepath(self.work_dir, 'renders/vald_rec_E{:03d}_It{:04d}_val_loss_{:.2f}.png'.format(self.current_epoch, batch_idx, val_loss.item()), isfile=True)
        #    self.renderer([batch, drec], out_fname = out_fname)
        #    dgen = self.vp_model.sample_poses(self.vp_ps.logging.num_bodies_to_display)
        #    out_fname = makepath(self.work_dir, 'renders/vald_gen_E{:03d}_I{:04d}.png'.format(self.current_epoch, batch_idx), isfile=True)
        #    self.renderer([dgen], out_fname = out_fname)
        progress_bar = {'v2v': val_loss}
        return {'val_loss': c2c(val_loss), 'progress_bar': progress_bar, 'log': progress_bar}

    def validation_epoch_end(self, outputs):
        metrics = {'val_loss': np.nanmean(np.concatenate([v['val_loss'] for v in outputs])) }
        print('metrice:', metrics)
        print('output:' , outputs)
        if self.global_rank == 0:

            self.text_logger('Epoch {}: {}'.format(self.current_epoch, ', '.join('{}:{:.2f}'.format(k, v) for k, v in metrics.items())))
            self.text_logger('lr is {}'.format([pg['lr'] for opt in self.trainer.optimizers for pg in opt.param_groups]))
        metrics = {k: torch.as_tensor(v) for k, v in metrics.items()}
        progress_bar = {'val_loss': metrics['val_loss']}
        return {'val_loss': metrics['val_loss'], 'progress_bar': progress_bar, 'log': `metrics}`
  early_stopping:
    monitor: val_loss
    min_delta: 0.0
    patience: 100
    verbose: True
    mode: min

> 

### Environment

- PyTorch Lightning Version (e.g., 1.4.5):
- PyTorch Version (e.g., 1.7.1):
- Python version (e.g., 3.7):
- OS (e.g., Linux):
- CUDA/cuDNN version:10.1
- How you installed PyTorch (`conda`,):

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions