-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When running with DDP, Lightning throws this warning:
UserWarning:
You requested 2 GPUs but launched 1 slurm tasks.
We will launch 2 processes for you.
We recommend you let slurm manage the processes by setting: --ntasks-per-node=2
If you're not using SLURM, ignore this message!
I made the suggested change, but I still get the warning. Digging into the code a bit, it looks like this warning goes away when $SLURM_NTASKS matches trainer.nb_requested_gpus. If I'm understanding the code correctly, this should be changed to check $SLURM_NTASKS_PER_NODE, since trainer.nb_requested_gpus is the number of gpus per node.
I'm happy to make the change if you agree that this is the correct fix.
To Reproduce
Submit job with test_tube.SlurmCluster
cluster = SlurmCluster(
hyperparam_optimizer=args,
log_path="./logs"
)
cluster.per_experiment_nb_gpus = 2
cluster.per_experiment_nb_nodes = 2
cluster.per_experiment_nb_cpus = 16
cluster.add_slurm_cmd(cmd="ntasks-per-node", value=str(cluster.per_experiment_nb_gpus), comment="1 task per gpu, for ddp")
cluster.job_time = "1:00:00"
cluster.gpu_type = "p100"
cluster.memory_mb_per_node = 300000
cluster.optimize_parallel_cluster_gpu(train, nb_trials=1, job_name="tml")
Expected behavior
Warning should go away and lightning should use slurm-created tasks
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working