Skip to content

Conversation

@awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Aug 1, 2022

What does this PR do?

Recently DeepSpeed resolved deepspeedai/DeepSpeed#2139 which will make their latest release again compatible with LightningLite. Hence we are removing the previously added user errors.

This PR can be merged once they make a new release. EDIT: This has now been released as deepspeed 0.7.0

Does your PR introduce any breaking changes? If yes, please list them.

No.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

I made sure I had fun coding 🙃

cc @Borda @carmocca @justusschock @awaelchli @akihironitta

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 1, 2022
@awaelchli awaelchli added this to the pl:1.7.x milestone Aug 1, 2022
@awaelchli awaelchli added the fabric lightning.fabric.Fabric label Aug 1, 2022
@awaelchli awaelchli self-assigned this Aug 8, 2022
@awaelchli awaelchli changed the base branch from master to test/refactor-subprocess-test August 8, 2022 21:47
Base automatically changed from test/refactor-subprocess-test to master August 10, 2022 21:15
@awaelchli awaelchli force-pushed the feature/lite-deepspeed branch from adcb423 to d92b9d7 Compare August 10, 2022 21:25
@awaelchli awaelchli marked this pull request as ready for review August 10, 2022 23:25
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changelog?

@mergify mergify bot added the ready PRs ready to be merged label Aug 11, 2022
@awaelchli awaelchli enabled auto-merge (squash) August 11, 2022 15:19
@awaelchli awaelchli merged commit 5653336 into master Aug 11, 2022
@awaelchli awaelchli deleted the feature/lite-deepspeed branch August 11, 2022 16:17
jessecambon pushed a commit to jessecambon/lightning that referenced this pull request Aug 16, 2022
lexierule pushed a commit that referenced this pull request Aug 17, 2022
* update version and changelog for 1.7.2 release

* Reset all results on epoch end (#14061)

Co-authored-by: Carlos Mocholí <[email protected]>

* Skip ddp fork tests on windows (#14121)

* Fix device placement when `.cuda()` called without specifying index (#14128)

* Convert subprocess test to standalone test (#14101)

* Fix entry point test for Python 3.10 (#14154)

* Fix flaky test caused by weak reference (#14157)

* Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151)



Co-authored-by: Rohit Gupta <[email protected]>

* Remove DeepSpeed version restriction from Lite (#13967)

* Configure the check-group app (#14165)

Co-authored-by: Jirka <[email protected]>

* Update onnxruntime requirement from <=1.12.0 to <1.13.0 in /requirements (#14083)

Updates the requirements on [onnxruntime](https://github.com/microsoft/onnxruntime) to permit the latest version.
- [Release notes](https://github.com/microsoft/onnxruntime/releases)
- [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md)
- [Commits](microsoft/onnxruntime@v0.1.4...v1.12.1)

---
updated-dependencies:
- dependency-name: onnxruntime
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update gcsfs requirement from <2022.6.0,>=2021.5.0 to >=2021.5.0,<2022.8.0 in /requirements (#14079)

Update gcsfs requirement in /requirements

Updates the requirements on [gcsfs](https://github.com/fsspec/gcsfs) to permit the latest version.
- [Release notes](https://github.com/fsspec/gcsfs/releases)
- [Commits](fsspec/gcsfs@2021.05.0...2022.7.1)

---
updated-dependencies:
- dependency-name: gcsfs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117)


fix

* CI: Replace `_` of in GHA workflow filenames with `-` (#13917)

* Rename workflow files

* Update docs

* Fix azure badges

* Update the main readme

* bad rebase

* Update doc

* CI: Update Windows version from 2019 to 2022 (#14129)

Update windows

* CI/CD: Add CUDA version to docker image tags (#13831)

* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <[email protected]>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <[email protected]>

* Avoid entry_points deprecation warning (#14052)

Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>

* Configure the check-group app (#14165)

Co-authored-by: Jirka <[email protected]>

* Profile batch transfer and gradient clipping hooks (#14069)

Co-authored-by: Rohit Gupta <[email protected]>

* Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* Avoid raising the sampler warning if num_replicas=1 (#14097)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

Co-authored-by: otaj <[email protected]>

* Remove skipping logic in favor of path filtering (#14170)

* Support checkpoint save and load with Stochastic Weight Averaging (#9938)

Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Carlos Mocholi <[email protected]>
Co-authored-by: Kushashwa Ravi Shrimali <[email protected]>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* Use fsdp module to initialize precision scalar for fsdp native (#14092)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* add more issues types (#14174)

* add more issues types

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Mansy <[email protected]>

* typo

Co-authored-by: Adrian Wälchli <[email protected]>

Co-authored-by: Kaushik B <[email protected]>
Co-authored-by: Mansy <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>

* CI: clean building docs (#14216)

* CI: clean building docs

* group

* .

* CI: docker focus on PL only (#14246)

* CI: docker focus on PL only

* group

* Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212)


Co-authored-by: otaj <[email protected]>

* Revert "Remove skipping logic in favor of path filtering (#14170)" (#14244)

* Update defaults for WandbLogger's run name and project name (#14145)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Luca Medeiros <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: otaj <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Kaushik B <[email protected]>
Co-authored-by: Mansy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deepspeed.zero.Init causes infinite recursion error

3 participants