Skip to content

ModelCheckpoint filename unable to use metrics that contain a slash #4012

@its-dron

Description

@its-dron

🐛 Bug

ModelCheckpoint is unable to save filenames that reference a metric with a slash in their name. I use grouped metrics for tensorboard, and would like to save my files containing my loss: val/loss. However, ModelCheckpoint uses os.path.split, which splits the file name: https://github.com/PyTorchLightning/pytorch-lightning/blob/6ac0958166c66ed599c96737b587232b7a33d89e/pytorch_lightning/callbacks/model_checkpoint.py#L258

If I try to use

ModelCheckpoint("root/dir/{epoch}_{val/loss:.5f}")

The above evaluates to

self.dirpath = "root/dir/{epoch}_{val" 
self.filename = "loss:.5f}"

This inevitably causes failure when attempting to format the output path.

To Reproduce

As above, log a metric with a slash, then use it in model checkpoint output

Code sample

class Module(pl.LightningModule):
...
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self.forward(x)
        loss = self.loss_fn(logits, y)
        self.log('val/loss', loss, on_epoch=True)
        return loss

...
def main():
    trainer = pl.Trainer(checkpoint_callback=ModelCheckpoint("{epoch}_{val/loss:.5f}"))

Expected behavior

Split only along file path boundaries, ignoring variable names yet-to-be-formatted.
Per the previous example, we'd expect:

self.dirpath = "root/dir" 
self.filename = "{epoch}_{val/loss:.5f}"

Environment

  • CUDA:
    • GPU:
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
    • available: True
    • version: 10.2
  • Packages:
    • numpy: 1.19.1
    • pyTorch_debug: False
    • pyTorch_version: 1.6.0
    • pytorch-lightning: 0.10.0
    • tqdm: 4.50.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.8.5
    • version: Proposal for help #1 SMP Fri Sep 4 14:19:36 UTC 2020

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfeatureIs an improvement or enhancementhelp wantedOpen to be worked onpriority: 1Medium priority task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions