When returning value in "training_step" function, one must specify "loss", otherwise results in error

## 🐛 Bug

I want to let the training step also return the probabilities that were predicted in the step in order to calculate the f1-score for the entire epoch. In order to do so I let the training step return the predictions and the ground truth values. I follow the instructions here: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#train-epoch-level-operations.

It mentions the "loss" in the code sample, yet nowhere did I read that you have to specify the loss in there. But you do have to specify that, otherwise an exception will be trown: 
```
> Traceback (most recent call last):
  File "/Users/rob/PycharmProjects/bauteilerkennung/DeepLearningPart/bte/models/image.py", line 163, in <module>
    model.fit(train_ds, valid_ds)
  File "/Users/rob/PycharmProjects/bauteilerkennung/DeepLearningPart/bte/models/image.py", line 74, in fit
    self.trainer.fit(self.model, train_loader, val_loader)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 490, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 731, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 432, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 726, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 814, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/Users/rob/anaconda3/envs/bte/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 301, in training_step
    closure_loss = training_step_output.minimize / self.trainer.accumulate_grad_batches
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
```
The code that produces this bug: 
```python
    class TransferLearner(pl.LightningModule):
      ...
      def training_step(self, train_batch, batch_idx):
          x, y = train_batch
          logits = self.forward(x)
          loss = self.cross_entropy_loss(logits, y)
          self.log('train_loss', loss)
          preds = F.softmax(logits, dim=1)
          return {"preds": preds, "gt": y}
  
      def training_epoch_end(self, outputs):
          preds = torch.cat([output["preds"] for output in outputs])
          gt = torch.cat([output["gt"] for output in outputs])
          f1_score = torchmetrics.functional.f1(preds, gt, num_classes=self.num_classes)
          self.log("train/f1_score", f1_score)
```
The only change I needed to make is to add the loss in the dictionary that returns the items:
```python
    def training_step(self, train_batch, batch_idx):
        x, y = train_batch
        logits = self.forward(x)
        loss = self.cross_entropy_loss(logits, y)
        self.log('train_loss', loss)
        preds = F.softmax(logits, dim=1)
        return {"loss": loss, "preds": preds, "gt": y}

    def training_epoch_end(self, outputs):
        preds = torch.cat([output["preds"] for output in outputs])
        gt = torch.cat([output["gt"] for output in outputs])
        f1_score = torchmetrics.functional.f1(preds, gt, num_classes=self.num_classes)
        self.log("train/f1_score", f1_score)
```
But I only stumbled upon this very randomly. Neither the docs nor the error message made it very clear to me what the problem was.
```python
The versions that I am using:
pytorch_lightning.__version__
'1.3.1'
torch.__version__
'1.7.1'
```
I hope this information suffices, otherwise please let me know what other information I should provide.
Thanks for this great repository by the way!!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When returning value in "training_step" function, one must specify "loss", otherwise results in error #7750

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When returning value in "training_step" function, one must specify "loss", otherwise results in error #7750

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions