Skip to content

Return a tensor in training_step leads to a list of dictionaries in training_epoch_end #9968

@BramVanroy

Description

@BramVanroy

When I run the following code in my model

    def training_step(self, batch, batch_idx):
        loss = self._step(batch, batch_idx)
        print(type(loss))
        self.log("train_loss", loss, sync_dist=True)
        return loss

    def training_epoch_end(self, outputs):
        print(outputs)
        self.log("train_ppl", self.calculate_ppl(outputs), sync_dist=True)

I see that the training step returns a tensor. So I would expect that the aggregated values that are passed to training_epoch_end is a list of tensors. But, as this code shows, it is not. Instead it is a list of dictionaries that looks like this:

[{'loss': tensor(4.1520, device='cuda:0')}, {'loss': tensor(4.1750, device='cuda:0')}]

Annoyingly, the expected behavior does seem to occur in the validation and testing loops. So I am not sure whether I am simply doing something wrong, or whether the training_step forcefully collects dictionaries for a specific reason, or whether this is actually a bug.

PL version: 1.4.9
Torch: 1.9.1+cu111

cc @Borda @carmocca @justusschock @ananthsub @ninginthecloud

Metadata

Metadata

Assignees

Labels

featureIs an improvement or enhancementloopsRelated to the Loop APIworking as intendedWorking as intended

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions