-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onworking as intendedWorking as intendedWorking as intended
Description
🐛 Bug
In PL 1.4, the order of hooks has changed.
in PL 1.3.8, it was
on_train_epoch_start
training_step
training_step
training_step
training_step
training_epoch_end
on_epoch_end
on_validation_epoch_start
validation_step
validation_step
validation_step
validation_step
validation_epoch_end
on_epoch_end
Now, in PL1.4, it is
on_train_epoch_start
training_step
training_step
training_step
training_step
on_validation_epoch_start
validation_step
validation_step
validation_step
validation_step
validation_epoch_end
on_epoch_end
training_epoch_end
on_epoch_end
i.e. training_epoch_end
runs after validation_epoch_end
instead of the last training_step
, which doesn't make sense since on_epoch_end
is 'just next to it'. Also, note the proximity of the two on_epoch_end
in PL 1.4
To Reproduce
You can use the following Colab link:
https://colab.research.google.com/github/mmg10/pl_bug/blob/main/pl_bug_138.ipynb
https://colab.research.google.com/github/mmg10/pl_bug/blob/main/pl_bug_140.ipynb
Environment
PyTorch Lightning 1.3.8 and 1.4.0 respectively
Significance
In PL 1.3.8, we could get the average of training loss across batches via
def training_epoch_end(self, outputs):
self.avg_train_loss = torch.stack([x['loss'] for x in outputs]).mean().item()
but now we can't, Note that we still can run the following
def validation_epoch_end(self, outputs):
avg_valid_loss = torch.stack([x['loss'] for x in outputs]).mean().item()
since the validation_epoch_end
is preceeded by the last validation_step
mgermain, breznak, Annusha and LogWell
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onworking as intendedWorking as intendedWorking as intended