Skip to content

training_epoch_end called before all steps of epoch were completed. always at about 0.25 size of steps. #7775

@ganitps

Description

@ganitps

🐛 Bug

GPU available: False, used: False
TPU available: None, using: 0 TPU cores
Validation sanity check:   0%|          | 0/2 [00:00<?, ?it/s]
  | Name                | Type                 | Params
-------------------------------------------------------------

-------------------------------------------------------------

Epoch 0:   0%|          | 0/13 [00:00<?, ?it/s] 
Epoch 0:  23%|██▎       | 3/13 [01:38<05:27, 32.75s/it, loss=4.73, v_num=7]
// training_epoch_end:  outputs = [{'loss': tensor(6.4593)}, {'loss': tensor(5.7653)}, {'loss': tensor(1.9642)}]

Validating: 0it [00:00, ?it/s]
Validating:   0%|          | 0/10 [00:00<?, ?it/s]
Epoch 0:  38%|███▊      | 5/13 [01:48<02:54, 21.78s/it, loss=4.73, v_num=7]
Epoch 0:  46%|████▌     | 6/13 [01:59<02:19, 19.91s/it, loss=4.73, v_num=7]
Epoch 0:  54%|█████▍    | 7/13 [02:10<01:51, 18.58s/it, loss=4.73, v_num=7]
Epoch 0:  62%|██████▏   | 8/13 [02:20<01:27, 17.60s/it, loss=4.73, v_num=7]
Epoch 0:  69%|██████▉   | 9/13 [02:31<01:07, 16.83s/it, loss=4.73, v_num=7]
Epoch 0:  77%|███████▋  | 10/13 [02:42<00:48, 16.21s/it, loss=4.73, v_num=7]
Epoch 0:  85%|████████▍ | 11/13 [02:52<00:31, 15.71s/it, loss=4.73, v_num=7]
Epoch 0:  92%|█████████▏| 12/13 [03:04<00:15, 15.34s/it, loss=4.73, v_num=7]
Epoch 0: 100%|██████████| 13/13 [03:15<00:00, 15.00s/it, loss=4.73, v_num=7]
Epoch 0: 100%|██████████| 13/13 [03:16<00:00, 15.15s/it, loss=4.73, v_num=7]
Epoch 1:  23%|██▎       | 3/13 [01:42<05:42, 34.24s/it, loss=3.39, v_num=7]
// training_epoch_end:  outputs = [{'loss': tensor(2.6766)}, {'loss': tensor(2.3010)}, {'loss': tensor(1.1722)}]
Epoch 1:  31%|███       | 4/13 [01:48<04:04, 27.22s/it, loss=3.39, v_num=7]
Validating: 0it [00:00, ?it/s]
Epoch 1:  38%|███▊      | 5/13 [02:02<03:15, 24.42s/it, loss=3.39, v_num=7]
Completed 6.8 MiB/327.9 MiB (48.7 KiB/s) with 2 file(s) remaining







Epoch 1: 100%|██████████| 13/13 [03:48<00:00, 17.54s/it, loss=3.39, v_num=7]
Epoch 2:  23%|██▎       | 3/13 [01:44<05:47, 34.72s/it, loss=2.72, v_num=7]
NUM EL TRAINING: 3   [{'loss': tensor(1.2504)}, {'loss': tensor(1.4905)}, {'loss': tensor(1.4158)}]
Epoch 2:  31%|███       | 4/13 [01:49<04:07, 27.48s/it, loss=2.72, v_num=7]
Validating: 0it [00:00, ?it/s]
Epoch 2: 100%|██████████| 13/13 [03:50<00:00, 17.75s/it, loss=2.72, v_num=7]
Epoch 3:  23%|██▎       | 3/13 [01:43<05:46, 34.62s/it, loss=2.27, v_num=7]
//training_epoch_end:   outputs = [{'loss': tensor(0.6632)}, {'loss': tensor(0.9215)}, {'loss': tensor(1.1396)}]
Epoch 3:  31%|███       | 4/13 [01:49<04:06, 27.41s/it, loss=2.27, v_num=7]
Validating: 0it [00:00, ?it/s]
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux): mac Catalina (this happens on all environments , linux etc)
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.7
  • CUDA/cuDNN version:
  • GPU models and configuration: happens also with 0 gpus.
  • Any other relevant information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions