Skip to content

Error in returning Dict from training_step with multiple GPUs #6193

@kchuang625

Description

@kchuang625

🐛 Bug

When using multiple GPUs with 'dp', the error RuntimeError: grad can be implicitly created only for scalar outputs occurs if I utilized training_step function like this:

def training_step(self, batch, batch_idx):
    ...
    return {'loss': loss}

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1hmHqYHPOqDlZUAF7-9zcCvobrvSPt7W5?usp=sharing

Expected behavior

It is supposed to work fine to return Dict with loss key.

A quick solution

Return loss tensor directly from training_step function:

def training_step(self, batch, batch_idx):
    ...
    return loss

Environment

  • PyTorchLightning Version: 1.2.0
  • PyTorch Version: 1.7.0
  • OS: Linux
  • Python version: 3.8
  • CUDA/cuDNN version: 10.2

cc. @carmocca

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdistributedGeneric distributed-related topicgood first issueGood for newcomershelp wantedOpen to be worked onpriority: 0High priority task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions