Metrics fail on DP and multiple GPU

## 🐛 Bug


When using a metric such as Accuracy from `pytorch_lightning.metrics` in machine with 4 GPU and in 'dp' mode, there is an error due to accumulating the metric in different devices. In the case of Accuracy, in line:
https://github.com/PyTorchLightning/pytorch-lightning/blob/c8ccec7a02c53ed38af6ef7193232426384eee4a/pytorch_lightning/metrics/classification/accuracy.py#L108

The arguments in torch.sum are in the same device the metric is been called from, but the `self.correct` is in a different one. The traceback is as follows:

```
    self.accuracy_val(y_hat, y)
  File "/home/***/.conda/envs/***/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/***/.conda/envs/***/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 153, in forward
    self.update(*args, **kwargs)
  File "/home/***/.conda/envs/***/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py", line 199, in wrapped_func
    return update(*args, **kwargs)
  File "/home/***/.conda/envs/***/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 109, in update
    self.correct += torch.sum(preds == target)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
```

## Please reproduce using [the BoringModel and post here](https://colab.research.google.com/drive/1HvWVVTK8j2Nj52qU4Q4YCyzOm0_aLQF3?usp=sharing)


https://colab.research.google.com/drive/1zcU1ADuHZj82clrBysv-EGfgqG7SxUhN#scrollTo=V7ELesz1kVQo

### To Reproduce

The shared colab is not going to be able to replicate the bug since it needs 'dp' on multiple gpus, but it should give an idea of when the error occurs. So setting         
```
        num_gpus=4,
        accelerator="dp",
```
in the Trainer and then using a metric should bring up the issue. I have tested it with Accuracy but other users in the Slack channel reported it for other metrics such as Precision or Recall.
### Expected behavior
The devices should be the same when the values are added together. I am not sure of which would be the correct approach, I have "brutely" solved it by:

```
        self.correct += torch.sum(preds.cuda(self.correct.device.index) == target.cuda(self.correct.device.index))
        self.total += target.cuda(self.correct.device.index).numel()
```

in the case of Accuracy, but that is quite an ugly way of dealing with it. 
Update: Although this doesn't produce the error, the accuracy is not properly computed, as values get reset to 0 for some reason between steps.



### Environment

* CUDA:
        - GPU:
                - GeForce GTX 1080 Ti
                - GeForce GTX 1080 Ti
                - GeForce GTX 1080 Ti
                - GeForce GTX 1080 Ti
        - available:         True
        - version:           10.2
* Packages:
        - numpy:             1.19.2
        - pyTorch_debug:     False
        - pyTorch_version:   1.6.0
        - pytorch-lightning: 1.0.3
        - tqdm:              4.50.2
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - ELF
        - processor:         
        - python:            3.8.5
        - version:           #1 SMP Debian 4.19.152-1 (2020-10-18)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics fail on DP and multiple GPU #4353

🐛 Bug

Please reproduce using the BoringModel and post here

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics fail on DP and multiple GPU #4353

Description

🐛 Bug

Please reproduce using the BoringModel and post here

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions