Where and when should device availability checks happen?

This came up in discussions around #11818, #11797, #11798, #11799

Currently, device checks happen inside of the accelerator connector during trainer initialization.
https://github.com/PyTorchLightning/pytorch-lightning/blob/a2d8c4f6a6080234e47ccc5ad593912303d29bf9/pytorch_lightning/trainer/connectors/accelerator_connector.py#L197-L229

### Discussion 1:
Should the per-device check move out of the Trainer to each Accelerator?

Pros for Trainer:
???

Pros for Accelerator:
- Logic is better encapsulated
- More extensible for new hardware without requiring changes to the Trainer

### Discussion 2:
Assuming device checks happen inside of the Accelerator class, should runtime checks happen by default? Or should each accelerator determine when to assert availability?

Pros for default checks:
- Additional safety independent of strategy logic

Pros for each accelerator determining this on their own:
- more flexibility around when this is called?

### Discussion 3:
Assuming device checks happen automatically inside of the Accelerator class, should these happen at initialization or during `setup_environment` as the first thing to happen during the Trainer's runtime?

Pros for setup environment:
- Mimics torch device experience
One can create `torch.device("cuda")` on a host without GPUs. However, moving a tensor to this device would fail because CUDA is unavailable. The corollary here would be the ability to create a GPUAccelerator on a host without GPUs. But calling `GPUAccelerator.setup_environment` would fail. 
- Easier for testing other parts of these classes (doesn't require mocking device availability)
- Instantiating the Accelerator class doesn't imply that the model is actually on the device. It's simply an intent of what hardware the model should be trained with

Pros for constructor:
- Fails faster

_Originally asked by @rohitgr7 in https://github.com/PyTorchLightning/pytorch-lightning/pull/11797#discussion_r800916653_

cc @tchaton @justusschock @awaelchli @borda @akihironitta @rohitgr7 @four4fish 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Where and when should device availability checks happen? #11831

Discussion 1:

Discussion 2:

Discussion 3:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Where and when should device availability checks happen? #11831

Description

Discussion 1:

Discussion 2:

Discussion 3:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions