Skip to content

WandBlogger doesn't log when working with TPU cores #9876

@rusty-electron

Description

@rusty-electron

🐛 Bug

I am trying to run a resnet200d model for document classification on Kaggle TPUs using python-lightning. I have set up WandB logger and connected it to my wandb account. The training starts well but at 50 steps the logger complains that It can't log as the process was called from a different pid. See error image below:

wandb_issue

I have tried to use options like sync_dist=True or rank_zero_only=True (in the self.log calls) but it still doesn't work.

To Reproduce

You can try running the mentioned kaggle notebook:
https://www.kaggle.com/rustyelectron/documentclassification-pytorch-tpu-wandb/

Wait till about 50 steps to get the error.

Expected behavior

The wandblogger should work normally and log data.

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: None
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: True
    • pyTorch_version: 1.7.0a0+7e71a98
    • pytorch-lightning: 1.2.10
    • tqdm: 4.62.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.10

Additional context

None

cc @kaushikb11 @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions