-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
Let WandbLogger log using:
- the trainer step by default (current behavior)
- an auto-incremented step, independent from the trainer
Motivation
The current default of using trainer step is a good default and seems to work fine for most people, automatically associated correctly training and validation metrics at the right step.
However, some users have a few issues with custom logging:
- wandb logger problem with on_step log on validation #4980 wants to see a chart of validation metrics over a single batch
- Logging validation metrics with
val_check_interval#5070 tries to log intermediate validation metrics, which uses a different step - "Step must only increase in log calls" when adding W&B logger after some training wandb/wandb#1626
- WandB-Logger drops all the logged values in training step for PyTorch Lightning wandb/wandb#1507
#5050 solves some of these issues by suggesting using wandb.log(my_dict, commit=False) when logging outside of the "regular" workflow (which will automatically associate metrics to last used step) but not all of them.
In some cases it may be hard to ensure the associated step is always increasing.
Pitch
We will be able to have a new parameter in WandbLogger (TBD) such as sync_step=True:
- by default
True, we will have the current behavior which syncswandb.stepwith the trainer step. - when set to
False, the step will auto-increment at each logging (by relying onwandb.log()default behaviour) and the trainer step will just be added to logged metrics when available
In those cases (set to False), we will still be able to manually select the trainer step as x-axis in the W&B dashboard (global graphs or any specific graph) so this feature will add more flexibility in logging.
It will not be the default behavior mainly for people not familiar with W&B and that expect to see validation and training metrics aligned by default (see #4113).