RPC timeout when model parallel with large model(> 4B)

## 🐛 Bug

Although different results will be shown depending on hardware conditions such as GPU specifications, it can be seen that rpc timeout errors may occur while making piped sequeicial models. In my case, it occurred by making the 4B's GPT2 model as a sequential parallel model on 8 x P40 GPU.

```
Exception has occurred: RuntimeError
RPCErr:1:RPC ran for more than 60000 milliseconds and timed out.
```

It would be nice to have the following `rpc_timeout_sec` parameters that can control the RPC timeout.

```python
DDPSequentialPlugin(balance=[5, 5, 5, 5, 5, 5, 5, 4], microbatches=8, rpc_timeout_sec=60 * 5)
```

After modifying the following code, training could proceed normally.

https://github.com/haven-jeon/pytorch-lightning/commit/6e2205cda65562df0860b14f6b20b994ee494d8a

## Please reproduce using the BoringModel

```
python train.py --gpus 8 --accelerator ddp  ....  --use_ddp_sequential
```

### To Reproduce

### Environment

 - PyTorch Version (e.g., 1.0): 1.6.0
 - OS (e.g., Linux): CentOS 7 
 - How you installed PyTorch (`conda`, `pip`, source): source 
 - Build command you used (if compiling from source): pip install -U .
 - Python version: 3.7
 - CUDA/cuDNN version: 10.1
 - GPU models and configuration: P40
 - Any other relevant information:

### Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC timeout when model parallel with large model(> 4B) #5318

🐛 Bug

Please reproduce using the BoringModel

To Reproduce

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RPC timeout when model parallel with large model(> 4B) #5318

Description

🐛 Bug

Please reproduce using the BoringModel

To Reproduce

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions