-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingloggerRelated to the LoggersRelated to the Loggerslogger: mlflowver: 2.5.x
Description
Bug description
The MLFlow logger associates two consecutive values of a metric with one epoch in the metric plots in the MLFlow ui.
This does not happen when x-axis is steps
.
I'm logging my metrics by using on_step=False
and on_epoch=True
using the following code:
def training_step(self, batch: torch.Tensor, batch_idx: int):
outdict = self.model_step(batch.flatten(start_dim=1).to(dtype=torch.float32))
# Decide what to log:
self.log_dict(
self._log_dict(outdict, "train", dataloader_idx=0),
prog_bar=False,
on_step=False,
on_epoch=True,
logger=True,
sync_dist=True,
add_dataloader_idx=False,
)
return outdict
def validation_step(
self, batch: torch.Tensor, batch_idx: int, dataloader_idx: Optional[int] = 0
):
outdict = self.model_step(batch.flatten(start_dim=1).to(dtype=torch.float32))
self.log_dict(
self._log_dict(outdict, "val", dataloader_idx=dataloader_idx),
prog_bar=False,
on_step=False,
on_epoch=True,
logger=True,
sync_dist=True,
add_dataloader_idx=False,
)
return outdict
I'm not calling log_metric
or log_dict
anywhere else.
The plots look like this when having step
on the x-axis.

The same plot using epoch
on the x-axis.

The epoch vs step plot looks fine though

Ignore the last point, that is due to running on the test data. Even if I don't run on this data, I still get the 'double points' for each epoch.
Am I doing something wrong? This seems like a pretty basic use case.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
cschell
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingloggerRelated to the LoggersRelated to the Loggerslogger: mlflowver: 2.5.x