-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on
Description
🐛 Bug
Using transformers + AdamW optimizer + batch size finder results in ~2 - 3 GB GPU memory not being freed after
trainer.tune (for xlm-roberta-base). This causes OOM issues on a subsequent call of trainer.fit.
I suspect that the state of the AdamW optimizer causes this issue.
Please reproduce using the BoringModel
https://colab.research.google.com/drive/1cugaUmLzNvk-38OyV8zyT9M9xQY4LkfH#scrollTo=j4w0wizx5XxJ
Expected behavior
GPU memory should be freed after the batch size finder (up to the model which may stay on GPU).
Environment
- CUDA:
- GPU:
- Tesla T4
- available: True
- version: 10.1
- GPU:
- Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.8.0+cu101
- pytorch-lightning: 1.2.4
- tqdm: 4.41.1
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on