Skip to content

model_to_device() missing 1 required positional argument 'process_idx' #5465

@adityabalu

Description

@adityabalu

🐛 Bug

When running the code for ddp_cpu on SLURM based cluster, I get this error:

Traceback (most recent call last): File "image_classifier.py", line 99, in <module> cli_main() File "image_classifier.py", line 87, in cli_main trainer.fit(model, datamodule=dm) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 472, in fit results = self.accelerator_backend.train() File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 64, in train self.ddp_train(process_idx=self.task_idx, model=model) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 172, in ddp_train self.model_to_device(model) TypeError: model_to_device() missing 1 required positional argument: 'process_idx'

When I look here the model_to_device function needs process_idx as an input, but is not sent here

Please reproduce using the BoringModel

I used this code :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/basic_examples/simple_image_classifier.py

Along with this slurm job script:

> #!/bin/bash
> #SBATCH --job-name='pl_dist'
> #SBATCH --nodes=2
> #SBATCH -p RM
> #SBATCH --ntasks-per-node=1
> #SBATCH -t 1:00:00
> 
> module load anaconda3
> source activate /pylon5/softwares/pytorch
> 
> export NCCL_DEBUG=INFO
> export PYTHONFAULTHANDLER=1
> 
> srun -n 2 --ntasks-per-node 1 python image_classifier.py --accelerator 'ddp_cpu' --num_nodes 2 --num_processes 1 --max_epochs 50

Environment

  • CUDA:
    - GPU:
    - available: False
    - version: 10.2
  • Packages:
    - numpy: 1.19.2
    - pyTorch_debug: False
    - pyTorch_version: 1.7.1
    - pytorch-lightning: 1.1.3
    - tqdm: 4.56.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.5
    - version: Proposal for help #1 SMP Mon Jul 29 17:46:05 UTC 2019

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions