Checkpointing with SLURM

#### What is your question?
I have a pytorch-lightning code with checkpointing that runs well on my desktop. But when I run it on our cluster with SLURM, the checkpoints do not get saved.

#### Code
```python
    model = Predictor(args)
    check = ModelCheckpoint(save_top_k=1, verbose=True, monitor='val_acc', mode='max',
            filepath='checks/{epoch})
    trainer = pl.Trainer(checkpoint_callback=check, max_epochs=100, gpus=1)
    trainer.fit(model)
```
#### What have you tried?
I run it in the cluster with the following code:
```bash
salloc -G 1 srun python main.py
```
#### What's your environment?

 - OS: Linux
 - Packaging conda


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Checkpointing with SLURM #2278

What is your question?

Code

What have you tried?

What's your environment?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Checkpointing with SLURM #2278

Description

What is your question?

Code

What have you tried?

What's your environment?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions