-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Description
🐛 Bug
1.6.0 is 5 times slower than 1.5.0
I used the same code and parameters on version 1.5.0 and version 1.6.0.
1.6.0 is 5 times slower than 1.5.0.
To Reproduce
1.6.0
> pip install pytorch-lightning==1.6.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1)` was configured so validation will run after every batch.
Missing logger folder: logs/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------------
0 | model | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss | 0
2 | seg_crit | CrossEntropyLoss | 0
-------------------------------------------------
13.0 M Trainable params
0 Non-trainable params
13.0 M Total params
51.831 Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.30s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:25<00:00, 2.11s/it, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Action | Mean duration (s) | Num calls | Total time (s) | Percentage % |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Total | - | 574 | 29.691 | 100 % |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| run_training_epoch | 25.136 | 1 | 25.136 | 84.66 |
| run_training_batch | 1.7876 | 12 | 21.451 | 72.247 |
| [LightningModule]LitPvnet.optimizer_step | 1.787 | 12 | 21.444 | 72.225 |
| [Strategy]SingleDeviceStrategy.backward | 1.1394 | 12 | 13.673 | 46.05 |
| [Strategy]SingleDeviceStrategy.training_step | 0.6367 | 12 | 7.6404 | 25.733 |
| [Callback]ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None, 'save_on_train_epoch_end': True}.on_train_epoch_end | 0.23877 | 1 | 0.23877 | 0.80419 |
| on_train_batch_end | 0.0010752 | 12 | 0.012903 | 0.043458 |
1.5.0
> pip install pytorch-lightning==1.5.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------------
0 | model | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss | 0
2 | seg_crit | CrossEntropyLoss | 0
-------------------------------------------------
13.0 M Trainable params
0 Non-trainable params
13.0 M Total params
51.831 Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.37s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:09<00:00, 1.23it/s, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report
Action | Mean duration (s) |Num calls | Total time (s) | Percentage % |
--------------------------------------------------------------------------------------------------------------------------------------
Total | - |_ | 14.094 | 100 % |
--------------------------------------------------------------------------------------------------------------------------------------
run_training_epoch | 9.7956 |1 | 9.7956 | 69.5 |
run_training_batch | 0.44689 |12 | 5.3626 | 38.048 |
get_train_batch | 0.23755 |13 | 3.0882 | 21.911 |
fetch_next_train_batch | 0.23753 |13 | 3.0879 | 21.909 |
optimizer_step_with_closure_0 | 0.2286 |12 | 2.7432 | 19.463 |
training_step_and_backward | 0.2224 |12 | 2.6688 | 18.935 |
model_forward | 0.18607 |12 | 2.2329 | 15.842 |
training_step | 0.18589 |12 | 2.2306 | 15.827 |
backward | 0.035633 |12 | 0.4276 | 3.0338 |
on_train_epoch_end | 0.4268 |1 | 0.4268 | 3.0282 |
on_train_batch_end | 0.0063143 |12 | 0.075772 | 0.53761 |
Environment
- PyTorch Version: 1.8.1+cu111
- Python version: 3.8.13
- OS : Ubuntu
- CUDA/cuDNN version: 11.4
- How you installed PyTorch (
conda,pip, source): conda
Robinysh