bad device index when broadcasting in multi-node, multi-gpu DDP

## 🐛 Bug

I'm running against Pytorch 1.7.1, and I was getting an "invalid device ordinal" error when running multi-node DDP training. After some digging, it looks like PyTorch was using [rank as the device ID to move a tensor to](https://github.com/pytorch/pytorch/blob/v1.7.1/torch/distributed/distributed_c10d.py#L1371-L1382). This relationship doesn't hold in multi-node, hence the invalid device ordinal. 

After some more digging, I realized Lightning handled this error by [overriding the PyTorch function](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/overrides/torch_distributed.py#L91-L94), but only for versions older than 1.7. Unfortunately, the problem persists in all of 1.7.x. 

If I hack things to always use the override (i.e. check `_TORCH_GREATER_EQUAL_1_8` instead of `_TORCH_GREATER_EQUAL_1_7`), I don't get the error anymore. 

## Please reproduce using the BoringModel

This doesn't have anything to do with a model, but if you want to see some code on how I'm calling PTL, [here you go](https://github.com/exabiome/deep-taxon/blob/summit_doover/bin/bcast_error_mwe.py). 

### To Reproduce

Run [this code](https://github.com/exabiome/deep-taxon/blob/summit_doover/bin/bcast_error_mwe.py) with PyTorch 1.7.x 

### Expected behavior

No "invalid device ordinal" error.

### Environment


 - PyTorch Version (e.g., 1.0): 1.7.1
 - OS (e.g., Linux):  Red Hat Enterprise Linux Server, 7.6 (Maipo)
 - How you installed PyTorch (`conda`, `pip`, source): cloned from IBM's open-ce Conda environment 
 - Build command you used (if compiling from source): N/A
 - Python version: 3.8
 - CUDA/cuDNN version: cudatoolkit-10.2.89, cudnn-7.6.5_10.2
 - GPU models and configuration: V100
 - Any other relevant information: running on PPC

### Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bad device index when broadcasting in multi-node, multi-gpu DDP #7580

🐛 Bug

Please reproduce using the BoringModel

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bad device index when broadcasting in multi-node, multi-gpu DDP #7580

Description

🐛 Bug

Please reproduce using the BoringModel

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions