-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
3rd partyRelated to a 3rd-partyRelated to a 3rd-partybugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority taskquestionFurther information is requestedFurther information is requested
Description
🐛 Bug
When logging on the validation_step with on_step=True and on_epoch=False the following happens:
- wandb warnings are generated to alert about a step numbering problem (probably confusing the validation step number which seems cyclical with the overall step which is always increasing)
- wandb charts for training (by step) is shrunk on the x dimension (like the number of steps for the whole training were less). We tested 2 training runs: the first (blue in the image below) with
on_step=Falseandon_epoch=Trueonvalidation_step, the second withon_step=Trueandon_epoch=False(red in the image below). As you can see the training chart is affected by this:
- an error is issued at the end of the second training run:
- two new (unrequested) panels appear at the top to the wandb project (this is the weirdest of the lot :-))
Please reproduce using the colab link at the top of this article
To Reproduce
Just change the validation_step logging like this:
def validation_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
# validation metrics
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y)
self.log('val_loss', loss, on_step=True, on_epoch=False, prog_bar=True)
self.log('val_acc', acc, on_step=True, on_epoch=False, prog_bar=True)
return loss
Metadata
Metadata
Assignees
Labels
3rd partyRelated to a 3rd-partyRelated to a 3rd-partybugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority taskquestionFurther information is requestedFurther information is requested



