-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onlogger: wandbWeights & BiasesWeights & Biases
Description
🐛 Bug
I am trying to run a resnet200d model for document classification on Kaggle TPUs using python-lightning. I have set up WandB logger and connected it to my wandb account. The training starts well but at 50 steps the logger complains that It can't log as the process was called from a different pid. See error image below:
I have tried to use options like sync_dist=True or rank_zero_only=True (in the self.log calls) but it still doesn't work.
To Reproduce
You can try running the mentioned kaggle notebook:
https://www.kaggle.com/rustyelectron/documentclassification-pytorch-tpu-wandb/
Wait till about 50 steps to get the error.
Expected behavior
The wandblogger should work normally and log data.
Environment
- CUDA:
- GPU:
- available: False
- version: None
- Packages:
- numpy: 1.19.5
- pyTorch_debug: True
- pyTorch_version: 1.7.0a0+7e71a98
- pytorch-lightning: 1.2.10
- tqdm: 4.62.1
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.7.10
Additional context
None
cc @kaushikb11 @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire
Metadata
Metadata
Assignees
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onlogger: wandbWeights & BiasesWeights & Biases
