Skip to content

Ordering of hooks #8670

@mmgxa

Description

@mmgxa

🐛 Bug

In PL 1.4, the order of hooks has changed.

in PL 1.3.8, it was

on_train_epoch_start
training_step
training_step
training_step
training_step
training_epoch_end
on_epoch_end
on_validation_epoch_start
validation_step
validation_step
validation_step
validation_step
validation_epoch_end
on_epoch_end

Now, in PL1.4, it is

on_train_epoch_start
training_step
training_step
training_step
training_step
on_validation_epoch_start
validation_step
validation_step
validation_step
validation_step
validation_epoch_end
on_epoch_end
training_epoch_end
on_epoch_end

i.e. training_epoch_end runs after validation_epoch_end instead of the last training_step, which doesn't make sense since on_epoch_end is 'just next to it'. Also, note the proximity of the two on_epoch_end in PL 1.4

To Reproduce

You can use the following Colab link:
https://colab.research.google.com/github/mmg10/pl_bug/blob/main/pl_bug_138.ipynb

https://colab.research.google.com/github/mmg10/pl_bug/blob/main/pl_bug_140.ipynb

Environment

PyTorch Lightning 1.3.8 and 1.4.0 respectively

Significance

In PL 1.3.8, we could get the average of training loss across batches via

def training_epoch_end(self, outputs):
    self.avg_train_loss = torch.stack([x['loss'] for x in outputs]).mean().item()

but now we can't, Note that we still can run the following

def validation_epoch_end(self, outputs):
     avg_valid_loss = torch.stack([x['loss'] for x in outputs]).mean().item()

since the validation_epoch_end is preceeded by the last validation_step

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions