Skip to content

Metrics get mapped twice to the same epoch in MLflow logger #20902

@bb511

Description

@bb511

Bug description

The MLFlow logger associates two consecutive values of a metric with one epoch in the metric plots in the MLFlow ui.
This does not happen when x-axis is steps.

I'm logging my metrics by using on_step=False and on_epoch=True using the following code:

def training_step(self, batch: torch.Tensor, batch_idx: int):
        outdict = self.model_step(batch.flatten(start_dim=1).to(dtype=torch.float32))

        # Decide what to log:
        self.log_dict(
            self._log_dict(outdict, "train", dataloader_idx=0),
            prog_bar=False,
            on_step=False,
            on_epoch=True,
            logger=True,
            sync_dist=True,
            add_dataloader_idx=False,
        )
        return outdict

    def validation_step(
        self, batch: torch.Tensor, batch_idx: int, dataloader_idx: Optional[int] = 0
    ):
        outdict = self.model_step(batch.flatten(start_dim=1).to(dtype=torch.float32))
        self.log_dict(
            self._log_dict(outdict, "val", dataloader_idx=dataloader_idx),
            prog_bar=False,
            on_step=False,
            on_epoch=True,
            logger=True,
            sync_dist=True,
            add_dataloader_idx=False,
        )

        return outdict

I'm not calling log_metric or log_dict anywhere else.

The plots look like this when having step on the x-axis.

Image

The same plot using epoch on the x-axis.

Image

The epoch vs step plot looks fine though

Image

Ignore the last point, that is due to running on the test data. Even if I don't run on this data, I still get the 'double points' for each epoch.

Am I doing something wrong? This seems like a pretty basic use case.

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

cc @lantiga @Borda

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions