v1.6 is slower than v1.5

## 🐛 Bug

### 1.6.0 is 5 times slower than 1.5.0

I used the same code and parameters on version 1.5.0 and version 1.6.0.
1.6.0 is 5 times slower than 1.5.0.

### To Reproduce
#### 1.6.0
```
> pip install pytorch-lightning==1.6.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1)` was configured so validation will run after every batch.
Missing logger folder: logs/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type               | Params
-------------------------------------------------
0 | model     | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss       | 0     
2 | seg_crit  | CrossEntropyLoss   | 0     
-------------------------------------------------
13.0 M    Trainable params
0         Non-trainable params
13.0 M    Total params
51.831    Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.30s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:25<00:00,  2.11s/it, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                                                            |  Mean duration (s)    |  Num calls            |  Total time (s)       |  Percentage %         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                                                             |  -                    |  574                  |  29.691               |  100 %                |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                                                                |  25.136               |  1                    |  25.136               |  84.66                |
|  run_training_batch                                                                                                                                                                                                |  1.7876               |  12                   |  21.451               |  72.247               |
|  [LightningModule]LitPvnet.optimizer_step                                                                                                                                                                          |  1.787                |  12                   |  21.444               |  72.225               |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                                                           |  1.1394               |  12                   |  13.673               |  46.05                |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                                                      |  0.6367               |  12                   |  7.6404               |  25.733               |
|  [Callback]ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None, 'save_on_train_epoch_end': True}.on_train_epoch_end                         |  0.23877              |  1                    |  0.23877              |  0.80419              |
|  on_train_batch_end                                                                                                                                                                                                |  0.0010752            |  12                   |  0.012903             |  0.043458             |
```
#### 1.5.0
```
> pip install pytorch-lightning==1.5.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type               | Params
-------------------------------------------------
0 | model     | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss       | 0     
2 | seg_crit  | CrossEntropyLoss   | 0     
-------------------------------------------------
13.0 M    Trainable params
0         Non-trainable params
13.0 M    Total params
51.831    Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.37s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:09<00:00,  1.23it/s, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report

Action                                  |  Mean duration (s)    |Num calls              |  Total time (s)       |  Percentage %         |
--------------------------------------------------------------------------------------------------------------------------------------
Total                                   |  -                    |_                      |  14.094               |  100 %                |
--------------------------------------------------------------------------------------------------------------------------------------
run_training_epoch                      |  9.7956               |1                      |  9.7956               |  69.5                 |
run_training_batch                      |  0.44689              |12                     |  5.3626               |  38.048               |
get_train_batch                         |  0.23755              |13                     |  3.0882               |  21.911               |
fetch_next_train_batch                  |  0.23753              |13                     |  3.0879               |  21.909               |
optimizer_step_with_closure_0           |  0.2286               |12                     |  2.7432               |  19.463               |
training_step_and_backward              |  0.2224               |12                     |  2.6688               |  18.935               |
model_forward                           |  0.18607              |12                     |  2.2329               |  15.842               |
training_step                           |  0.18589              |12                     |  2.2306               |  15.827               |
backward                                |  0.035633             |12                     |  0.4276               |  3.0338               |
on_train_epoch_end                      |  0.4268               |1                      |  0.4268               |  3.0282               |
on_train_batch_end                      |  0.0063143            |12                     |  0.075772             |  0.53761              |
```

### Environment

- PyTorch Version: 1.8.1+cu111
- Python version: 3.8.13
- OS : Ubuntu
- CUDA/cuDNN version: 11.4
- How you installed PyTorch (`conda`, `pip`, source): conda


cc @borda @akihironitta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.6 is slower than v1.5 #12713

🐛 Bug

1.6.0 is 5 times slower than 1.5.0

To Reproduce

1.6.0

1.5.0

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

v1.6 is slower than v1.5 #12713

Description

🐛 Bug

1.6.0 is 5 times slower than 1.5.0

To Reproduce

1.6.0

1.5.0

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions