-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on
Description
🐛 Bug
Logging and checkpoint saving stopped working for me when I run experiments via slurm system.
I am using log keys in return functions: training_epoch_end/validation_epoch_end.
Version 0.7.6 works.
To Reproduce
Steps to reproduce the behaviour:
- Define Tensorboard logger
- Run training using slurm system
sbatch ... - No logs.
Code sample
Expected behaviour
Environment
- PyTorch 1.4.0:
- PyTorch-lightning 0.8.1,
- Linux,
- Python 3.7.6,
- CUDA/cuDNN 10.1, 7.6.5,
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on