Skip to content

wandb logger problem with on_step log on validation #4980

@andreaRapuzzi

Description

@andreaRapuzzi

🐛 Bug

When logging on the validation_step with on_step=True and on_epoch=False the following happens:

  • wandb warnings are generated to alert about a step numbering problem (probably confusing the validation step number which seems cyclical with the overall step which is always increasing)

image

  • wandb charts for training (by step) is shrunk on the x dimension (like the number of steps for the whole training were less). We tested 2 training runs: the first (blue in the image below) with on_step=False and on_epoch=True on validation_step, the second with on_step=True and on_epoch=False (red in the image below). As you can see the training chart is affected by this:

image

  • an error is issued at the end of the second training run:

image

  • two new (unrequested) panels appear at the top to the wandb project (this is the weirdest of the lot :-))

image

Please reproduce using the colab link at the top of this article

To Reproduce

Just change the validation_step logging like this:

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self(x)
    loss = F.nll_loss(logits, y)

    # validation metrics
    preds = torch.argmax(logits, dim=1)
    acc = accuracy(preds, y)
    self.log('val_loss', loss, on_step=True, on_epoch=False, prog_bar=True)
    self.log('val_acc', acc, on_step=True, on_epoch=False, prog_bar=True)
    return loss

Metadata

Metadata

Assignees

No one assigned

    Labels

    3rd partyRelated to a 3rd-partybugSomething isn't workinghelp wantedOpen to be worked onpriority: 1Medium priority taskquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions