Rewrite Accelerator_connector and follow up tasks

## Proposed refactor

We have been discussing this for a while. There are issues related to this topic like:


### Motivation

Moving towards sable strategy version
The current logic is not clear and hard to maintain 
There are a lot of simplification we can do after the rewrite

### Pitch

**The new logic can be divided to 3 parts** (Details in the [PR](https://github.com/PyTorchLightning/pytorch-lightning/pull/11448/))

Part1 : Check mis config set by user - conflict between flags, duplication between flags. And set final flag

Part 2: Choose Strategy, Accelerator, Precision, cluster_envirment and set up parallel devices

Part 3: Initialized Strategy, set up Strategy's Accelerator, Precision, Checkpoint_IO, Cluster environment and Parallel_devices (all require lazy initialization)


### **Follow up items from #11448** 
1. [ ] Move error messages to precisionPlugin, strategy and accelerator __init__ method if possible. 
    eg: move this check to IPUPrecision plugin. from @carmocca
```python
if self._precision_flag not in (16, 32):
                raise MisconfigurationException(
                    f"`Trainer(accelerator='ipu', precision={self._precision_flag!r})` is not supported."
                )
 ```
move this check to strategy. from  @ananthsub 
```python
if self._precision_flag in (16, "bf16") and self._amp_type_flag == AMPType.APEX:
            if isinstance(self.strategy, (DDPShardedStrategy, DDPSpawnShardedStrategy, DDPFullyShardedStrategy)):
                raise MisconfigurationException(
                    "Sharded plugins are not supported with apex, please switch to `amp_backend='native'`."
```
         
2. [x] Add typing to accel_connector. Can we do this as a separate PR after unused properties deprecation? from @kaushikb11  @awaelchli @ananthsub 
3. [ ] Reduce duplicated strategy registry code: Classmethod inheritance doesn't work with current strategy registry logic, cls is the base class not the inheritance class. To reduce duplicated `register_strategies` method, we need redo the strategy registry logic. @kaushikb11 @awaelchli @tchaton 
4. [ ] Flag conflict and fallback logic revisit:
       - different flag set to the same thing: should be error (from @tchaton )
       - dp/ddp2 on cpu fallback to ddp: should be error instead of silent fallback (from @ananthsub )
       - [RFC] handle `cluster_env` and `checkpoint_io` set in both strategy() and plugins eg: (strategy=DDPPlugin(cluster_env=LightningEnv()), plugin=[TorchelasticEnv()]) 
       - check there is only 1 instance of each type at most in plugin flag (from @tchaton )
       - now DDP is the default with 1 GPU multi node, why not fallback to ddp_spawn for all (from @tchaton )
       - add/revisit warning for fallback logic
       - Is Apex supported with Sharded methods? Should we remove self._precision_flag in (16, "bf16") from the "Sharded plugins are not supported with apex, please switch to `amp_backend='native'`."check? (from @tchaton )
5. [x]  Move _IS_INTERACTIVE check to strategy
6. [ ] Loss check for "The `TPUAccelerator` can only be used with a `SingleTPUStrategy` or `TPUSpawnStrategy`," from @ananthsub (not required, nice to have)
7. [ ] improving error message
     - "You can only specify one strategy to the Trainer." f"You have passed `Trainer(strategy={strategy})`" f" but you have also passed {accelerator} in Trainer(accelerator={accelerator}) instead of "accelerator set through both strategy class and accelerator flag, choose one" (from @ananthsub)
     - "You passed `Trainer(accelerator='cpu', precision=16, amp_type='apex')`"
                " but apex AMP not supported on CPU." Worth to mention this works with bfloat16 and native. (from @tchaton )
8. [x]  Enable accelerator.is_available() check
9. [ ] all the TODOs in accelerator_connector:
     - deprecate unused properties
     -

10. [x] **(HIGH PRIORITY) Re-introduce the _init_deterministic method on the AcceleratorConnector and set the value for deterministic.** 




### Additional context

**Improvement and potential improvement:**
- Enums could be deprecated, _StrategyType, _AcceleratorType, _distrib_type, _device_type and distributed_backend is not needed in new version
- Strategy registry logic revisite : Now we have half of the str name registered, the rest half in _StrategyType, we could consolidate
- Further Lazy initialization of the parallel Strategy classes: parallel devices need to be lazy initialized
- Revisit flag priorities(part 1), choosing logic (part 2) and associated tests
- Consolidate and revisit device parse related logic in utilities/devices, trainer and XAccelerators
- Improve test, increase coverage and remove unnecessary tests
- Deprecate unused functions from accelerator_connector (kept for now for backward compatibility)

______________________________________________________________________

#### If you enjoy Lightning, check out our other projects! ⚡

- [**Metrics**](https://github.com/PyTorchLightning/metrics): Machine learning metrics for distributed, scalable PyTorch applications.

- [**Lite**](https://pytorch-lightning.readthedocs.io/en/latest/starter/lightning_lite.html): enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

- [**Flash**](https://github.com/PyTorchLightning/lightning-flash): The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

- [**Bolts**](https://github.com/PyTorchLightning/lightning-bolts): Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

- [**Lightning Transformers**](https://github.com/PyTorchLightning/lightning-transformers): Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.


cc @justusschock @awaelchli @akihironitta @rohitgr7 @kaushikb11 @ninginthecloud @carmocca @ananthsub @tchaton 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite Accelerator_connector and follow up tasks #11449

Proposed refactor

Motivation

Pitch

Follow up items from #11448

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rewrite Accelerator_connector and follow up tasks #11449

Description

Proposed refactor

Motivation

Pitch

Follow up items from #11448

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions