[RFC] `Add Accelerator.is_available()` interface requirement 

## 🚀 Feature



### Motivation

Such functionality on the Accelerator abstraction would:
- Enable automatic hardware selection without duplicating code across Trainer & individual accelerator implementations.
- Simplify the accelerator connector logic and rewrite effort: https://github.com/PyTorchLightning/pytorch-lightning/pull/11448
- Enable automatic runtime checking of hardware availability during execution
- Provide consistency with how the Trainer auto-detects the cluster environments natively supported by the framework. The corollary here is `ClusterEnvironment.detect`

https://github.com/PyTorchLightning/pytorch-lightning/blob/9e63281a4c4a62f32cad9801a23b63454f8311be/pytorch_lightning/plugins/environments/cluster_environment.py#L43-L46

https://github.com/PyTorchLightning/pytorch-lightning/blob/9e63281a4c4a62f32cad9801a23b63454f8311be/pytorch_lightning/trainer/connectors/accelerator_connector.py#L810-L812




### Pitch

```py
class Accelerator(ABC):

    @staticmethod
    @abstractmethod
    def is_available() -> bool:
        """Detect if the hardware is available."""
    
    def setup_environment(self, root_device: torch.device) -> None:
        """Setup any processes or distributed connections.
        This is called before the LightningModule/DataModule setup hook which allows the user to access the accelerator
        environment before setup is complete.
        Raises:
            RuntimeError:
                If corresponding hardware is not found.
        """
        if not self.is_available():
            raise RuntimeError(f"{self.__class__.__qualname__} is not configured to run on this hardware.")
```

```py
class CPUAccelerator(Accelerator):
    @staticmethod
    def is_available() -> bool:
        """CPU is always available for execution."""
        return True

```

```py
class GPUAccelerator(Accelerator):
   @staticmethod
    def is_available() -> bool:
        return torch.cuda.is_available() and torch.cuda.device_count() > 0 
```

and so on

See a more-detailed implementation here: https://github.com/PyTorchLightning/pytorch-lightning/pull/11797 for what this looks like in practice

To support `Trainer(accelerator="auto")` this is what the logic simplifies to:

```py
for acc_cls in (GPUAccelerator, TPUAccelerator, IPUAccelerator, CPUAccelerator):
    if acc_cls.is_available():
        return acc_cls()
return CPUAccelerator() # fallback to CPU
```

this could be even further simplified if we offered an AcceleratorRegistry, such that the Trainer/AcceleratorConnector didn't need to hardcode the list of accelerators to detect:
```py
for acc_cls in AcceleratorRegistry.impls:
    if acc_cls.is_available():
        return acc_cls()
return CPUAccelerator() # fallback to CPU
```




### Alternatives

Some other alternatives exist here: 
https://github.com/PyTorchLightning/pytorch-lightning/pull/11799
https://github.com/PyTorchLightning/pytorch-lightning/pull/11798

Issues with these approaches:
- Also breaking changes: simply instantiating the accelerator could raise a runtime error if the device isn't available.
- The bigger issue to me is that it does not ease support for `Trainer(accelerator="auto")`. The accelerator connector needs to hardcode & re-implement each of the device checks to determine which Accelerator to even instantiate.




### Additional context



______________________________________________________________________

#### If you enjoy Lightning, check out our other projects! ⚡

- [**Metrics**](https://github.com/PyTorchLightning/metrics): Machine learning metrics for distributed, scalable PyTorch applications.

- [**Lite**](https://pytorch-lightning.readthedocs.io/en/latest/starter/lightning_lite.html): enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

- [**Flash**](https://github.com/PyTorchLightning/lightning-flash): The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

- [**Bolts**](https://github.com/PyTorchLightning/lightning-bolts): Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

- [**Lightning Transformers**](https://github.com/PyTorchLightning/lightning-transformers): Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.


cc @borda @tchaton @justusschock @awaelchli @akihironitta @rohitgr7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] `Add Accelerator.is_available()` interface requirement #11818

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Add Accelerator.is_available() interface requirement #11818

Description

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RFC] `Add Accelerator.is_available()` interface requirement #11818