Skip to content

Conversation

@carmocca
Copy link
Contributor

@carmocca carmocca commented Aug 6, 2022

What does this PR do?

Fixes #13554

Does your PR introduce any breaking changes? If yes, please list them.

Epoch-end results will not be logged again in subsequent epochs.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @carmocca @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy @akihironitta

@carmocca carmocca added bug Something isn't working logging Related to the `LoggerConnector` and `log()` labels Aug 6, 2022
@carmocca carmocca added this to the pl:1.7.x milestone Aug 6, 2022
@carmocca carmocca self-assigned this Aug 6, 2022
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 6, 2022
@carmocca carmocca force-pushed the bugfix/logging-reset-results branch from b51de20 to 8e44431 Compare August 6, 2022 15:32
@carmocca carmocca force-pushed the bugfix/logging-reset-results branch from 8e44431 to b46bc87 Compare August 6, 2022 15:34
@carmocca carmocca marked this pull request as ready for review August 6, 2022 16:17
@codecov
Copy link

codecov bot commented Aug 6, 2022

Codecov Report

Merging #14061 (b46bc87) into master (b25275c) will increase coverage by 15%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master   #14061     +/-   ##
=========================================
+ Coverage      61%      76%    +15%     
=========================================
  Files         324      324             
  Lines       26384    26420     +36     
=========================================
+ Hits        16169    20147   +3978     
+ Misses      10215     6273   -3942     

@awaelchli
Copy link
Contributor

@carmocca I'm quite confused as to how the on_step=False, on_epoch=True|False affects whether the special on_train_start value is being logged or not. Is this expected?

class BugModel(BoringModel):
    def on_train_start(self):
        print("ON_TRAIN_START")
        self.log("train_loss", 5000, on_step=False, on_epoch=True)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss, on_step=False, on_epoch=True)
        return loss

@carmocca
Copy link
Contributor Author

carmocca commented Aug 9, 2022

@carmocca I'm quite confused as to how the on_step=False, on_epoch=True|False affects whether the special on_train_start value is being logged or not. Is this expected?

This question is totally unrelated to the PR, but to answer it. It's because we differentiate results by the {fx_name}.{logged_key}, meaning that the values self.logged in each hook don't get reduced together.

However, since they just get added to a simple dictionary (logged_metrics), and we only use logged_key as the key, the latter overwrites the former. It's a legacy design limitation

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Aug 9, 2022
@rohitgr7 rohitgr7 merged commit d850854 into master Aug 9, 2022
@rohitgr7 rohitgr7 deleted the bugfix/logging-reset-results branch August 9, 2022 17:31
awaelchli added a commit that referenced this pull request Aug 9, 2022
jessecambon pushed a commit to jessecambon/lightning that referenced this pull request Aug 16, 2022
lexierule pushed a commit that referenced this pull request Aug 17, 2022
* update version and changelog for 1.7.2 release

* Reset all results on epoch end (#14061)

Co-authored-by: Carlos Mocholí <[email protected]>

* Skip ddp fork tests on windows (#14121)

* Fix device placement when `.cuda()` called without specifying index (#14128)

* Convert subprocess test to standalone test (#14101)

* Fix entry point test for Python 3.10 (#14154)

* Fix flaky test caused by weak reference (#14157)

* Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151)



Co-authored-by: Rohit Gupta <[email protected]>

* Remove DeepSpeed version restriction from Lite (#13967)

* Configure the check-group app (#14165)

Co-authored-by: Jirka <[email protected]>

* Update onnxruntime requirement from <=1.12.0 to <1.13.0 in /requirements (#14083)

Updates the requirements on [onnxruntime](https://github.com/microsoft/onnxruntime) to permit the latest version.
- [Release notes](https://github.com/microsoft/onnxruntime/releases)
- [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md)
- [Commits](microsoft/onnxruntime@v0.1.4...v1.12.1)

---
updated-dependencies:
- dependency-name: onnxruntime
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update gcsfs requirement from <2022.6.0,>=2021.5.0 to >=2021.5.0,<2022.8.0 in /requirements (#14079)

Update gcsfs requirement in /requirements

Updates the requirements on [gcsfs](https://github.com/fsspec/gcsfs) to permit the latest version.
- [Release notes](https://github.com/fsspec/gcsfs/releases)
- [Commits](fsspec/gcsfs@2021.05.0...2022.7.1)

---
updated-dependencies:
- dependency-name: gcsfs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117)


fix

* CI: Replace `_` of in GHA workflow filenames with `-` (#13917)

* Rename workflow files

* Update docs

* Fix azure badges

* Update the main readme

* bad rebase

* Update doc

* CI: Update Windows version from 2019 to 2022 (#14129)

Update windows

* CI/CD: Add CUDA version to docker image tags (#13831)

* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <[email protected]>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <[email protected]>

* Avoid entry_points deprecation warning (#14052)

Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>

* Configure the check-group app (#14165)

Co-authored-by: Jirka <[email protected]>

* Profile batch transfer and gradient clipping hooks (#14069)

Co-authored-by: Rohit Gupta <[email protected]>

* Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* Avoid raising the sampler warning if num_replicas=1 (#14097)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

Co-authored-by: otaj <[email protected]>

* Remove skipping logic in favor of path filtering (#14170)

* Support checkpoint save and load with Stochastic Weight Averaging (#9938)

Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Carlos Mocholi <[email protected]>
Co-authored-by: Kushashwa Ravi Shrimali <[email protected]>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* Use fsdp module to initialize precision scalar for fsdp native (#14092)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>

* add more issues types (#14174)

* add more issues types

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Mansy <[email protected]>

* typo

Co-authored-by: Adrian Wälchli <[email protected]>

Co-authored-by: Kaushik B <[email protected]>
Co-authored-by: Mansy <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>

* CI: clean building docs (#14216)

* CI: clean building docs

* group

* .

* CI: docker focus on PL only (#14246)

* CI: docker focus on PL only

* group

* Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212)


Co-authored-by: otaj <[email protected]>

* Revert "Remove skipping logic in favor of path filtering (#14170)" (#14244)

* Update defaults for WandbLogger's run name and project name (#14145)

Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Luca Medeiros <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: otaj <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <[email protected]>
Co-authored-by: Laverne Henderson <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Kaushik B <[email protected]>
Co-authored-by: Mansy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working logging Related to the `LoggerConnector` and `log()` pl Generic label for PyTorch Lightning package ready PRs ready to be merged

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

logger called during on_train_start called multiple times during training

4 participants