-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Proposed refactoring or deprecation
Motivation
Accelerator is not stable API yet, we can improve the Accelerator related logic and move towards stable Accelerator version for 1.6
Pitch
Steps
- Collective refactor Consolidate collective functions #7534
- deprecate accelerator collective directly call from TTP 1/n Call
training_type_plugincollective functions directly instead of going through the Accelerator #9677 - collective refactor 2/n Consolidate collective functions - collective base and subclasses #9414
- deprecate accelerator collective directly call from TTP 1/n Call
- Move Precision Plugin into TTP Precision Plugins should be part of Training Type Plugins #7324
- Move Accelerator into Strategy [Accelerator refactor] Move Accelerator into Strategy #10648
- Simplify the Spawning logic Simplify multiprocessing logic in DDPSpawn plugins #10059
- [RFC] Simplifying the Accelerator Connector logic and flags (can be done in parallel with aboves) Rewrite Accelerator_connector and follow up tasks #11449 11449
- [RFC] Revisit the inheritance of TTP Flatten the Strategy inheritance #11863
More details in: Accelerator Refactor Proposal
[updating]
FAQ
-
Will this be a lot of breaking changes?
Not much user facing API changes from 1,2,3,4.(Unless we found out other existing bugs during refactor) The only breaking change will be for custom plugins
5 and 6 is still RFC stage, may have breaking changes which impact user facing APIs -
How does this impact lightningLite?
Should be helpful for lightningLite too, there maybe function refactor/simplification could happen for lightningLite. (@awaelchli any suggestion about this part?)
Follow up TODOs:
-
check if trainer.FITTING before setting up optimizers 2/n Move Precision Plugin into strategy - move optimizer related logics #10596 (comment)
-
Send a PR to https://github.com/ray-project/ray_lightning to update their plugins
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @tchaton @Borda @kaushikb11 @ananthsub