Skip to content

Commit f7459f5

Browse files
Sean Narencarmocca
andauthored
DeepSpeed Infinity Update (#7234)
* Update configs to match latest API * Ensure we move the entire model to device before configure optimizer is called * Add missing param * Expose parameters * Update references, drop local rank as it's now infered from the environment variable * Fix ref * Force install deepspeed 0.3.16 * Add guard for init * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Carlos Mocholí <[email protected]> * Revert type checking * Install master for CI for testing purposes * Update CI * Fix tests * Add check * Update versions * Set precision * Fix * See if i can force upgrade * Attempt to fix * Drop * Add changelog Co-authored-by: Carlos Mocholí <[email protected]>
1 parent 03e7bdf commit f7459f5

File tree

6 files changed

+225
-52
lines changed

6 files changed

+225
-52
lines changed

.azure-pipelines/gpu-tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ jobs:
5757
- bash: |
5858
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
5959
pip install fairscale>=0.3.4
60+
pip install deepspeed>=0.4.0 -U
6061
pip install . --requirement requirements/devel.txt
6162
pip list
6263
displayName: 'Install dependencies'

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
8080
- Added LightningCLI support for argument links applied on instantiation ([#7895](https://github.com/PyTorchLightning/pytorch-lightning/pull/7895))
8181

8282

83+
- Added DeepSpeed Infinity Support, and updated to DeepSpeed 0.4.0 ([#7234](https://github.com/PyTorchLightning/pytorch-lightning/pull/7234))
84+
85+
8386
- Added support for `torch.nn.UninitializedParameter` in `ModelSummary` ([#7642](https://github.com/PyTorchLightning/pytorch-lightning/pull/7642))
8487

8588

dockers/base-cuda/Dockerfile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,7 @@ RUN \
118118

119119
RUN \
120120
# install DeepSpeed
121-
# TODO(@SeanNaren): CI failing with `>=0.3.15` - skipping to unblock
122-
pip install deepspeed==0.3.14
121+
pip install deepspeed==0.4.0
123122

124123
RUN \
125124
# Show what we have

0 commit comments

Comments
 (0)