CUDA memory leak after batch size finder

## 🐛 Bug

Using `transformers` + `AdamW optimizer` + `batch size finder` results in ~2 - 3 GB GPU memory not being freed after
`trainer.tune` (for `xlm-roberta-base`). This causes OOM issues on a subsequent call of `trainer.fit`.
I suspect that the state of the  `AdamW optimizer` causes this issue.


## Please reproduce using the BoringModel

https://colab.research.google.com/drive/1cugaUmLzNvk-38OyV8zyT9M9xQY4LkfH#scrollTo=j4w0wizx5XxJ


### Expected behavior

GPU memory should be freed after the batch size finder (up to the model which may stay on GPU).

### Environment


* CUDA:
	- GPU:
		- Tesla T4
	- available:         True
	- version:           10.1
* Packages:
	- numpy:             1.19.5
	- pyTorch_debug:     False
	- pyTorch_version:   1.8.0+cu101
	- pytorch-lightning: 1.2.4
	- tqdm:              4.41.1
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.7.10
	- version:           #1 SMP Thu Jul 23 08:00:38 PDT 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA memory leak after batch size finder #6570

🐛 Bug

Please reproduce using the BoringModel

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA memory leak after batch size finder #6570

Description

🐛 Bug

Please reproduce using the BoringModel

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions