Skip to content

ValueError('signal only works in main thread') messages back for wand sweeps in v1.5.6 #11118

@garrett361

Description

@garrett361

🐛 Bug

An issue raised in #10336 is back in a lesser form in pl v1.5.6.

After upgrading to v1.5.6, wandb hyperparameter sweeps performed in Colab notebooks raise aValueError:

ValueError('signal only works in main thread')

Unlike in #10336, the error does not terminate the sweep, which still seems to sync up with wandb without issue, as far as I can tell.

To Reproduce

Reproduced at the end of this minimally modified BoringModel Colab notebook: https://colab.research.google.com/drive/1QheDfu4G5QEUSnHWpvK7UZzdHysRQq4h?usp=sharing

Expected behavior

wandb sweeps should complete without the cited ValueError. This was previously fixed in pl v1.5.3 and v1.5.4

Environment

  • CUDA:
    • GPU:
      • A100-SXM4-40GB
    • available: True
    • version: 11.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.10.0+cu111
    • pytorch-lightning: 1.5.6
    • tqdm: 4.62.3
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.12
    • version: Proposal for help #1 SMP Sat Jun 5 09:50:34 PDT 2021
      You can also fill out the list below manually.
      -->
  • PyTorch Lightning Version (e.g., 1.3.0):
  • PyTorch Version (e.g., 1.8)
  • Python version:
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions