-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingloopsRelated to the Loop APIRelated to the Loop API
Description
🐛 Bug
Trying to use DETR feature extractor, when the training starts, at random epoch (always below 20) the training stops with no further info/warning/error.
I have tried the code with three different GPU powered machines, plus CoLab, all gives same result, for all the machines that I have used, RAM is not less than 32GB and GPU (GTX 1060, RTX 2080ti, Tesla T4) all with Ubuntu 20.04.
To Reproduce
Please just have a look on the following Notebook, it uses a small dataset and it is based on the following Tutorial
My code can be found here:
https://colab.research.google.com/drive/1yyql7CPrly75TUBIMD-l16-oR8ykDMsR?usp=sharing
Expected behavior
To continue the training cycle.
Environment
- PyTorch Lightning Version (e.g., 1.5.0):1.6.5
- PyTorch Version (e.g., 1.10): 1.12 and also 1.8 LTS
- Python version (e.g., 3.9): 3.7, 3.9, 3.8
- OS (e.g., Linux): 20.04
- CUDA/cuDNN version: 11.2, 11.5, 11.0 (tired these three)
- GPU models and configuration:
- How you installed PyTorch (
conda,pip, source): pip - If compiling from source, the output of
torch.__config__.show(): - Any other relevant information:
Additional context
cc @carmocca @justusschock @ananthsub @ninginthecloud @rohitgr7
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingloopsRelated to the Loop APIRelated to the Loop API