Skip to content

Logging on slurm stopped working #2317

@kumuji

Description

@kumuji

🐛 Bug

Logging and checkpoint saving stopped working for me when I run experiments via slurm system.
I am using log keys in return functions: training_epoch_end/validation_epoch_end.
Version 0.7.6 works.

To Reproduce

Steps to reproduce the behaviour:

  1. Define Tensorboard logger
  2. Run training using slurm system sbatch ...
  3. No logs.

Code sample

Expected behaviour

Environment

  • PyTorch 1.4.0:
  • PyTorch-lightning 0.8.1,
  • Linux,
  • Python 3.7.6,
  • CUDA/cuDNN 10.1, 7.6.5,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions