Skip to content

Assertion error during kinetics400 validation #4839

@datumbox

Description

@datumbox

🐛 Describe the bug

Running on main:

torchrun --nproc_per_node=8 train.py --data-path /datasets01/kinetics/070618/400/ --train-dir=val --val-dir=val --batch-size=16 --sync-bn --test-only --pretrained --cache-dataset

throws the following error:

Test:  [2200/3008]  eta: 0:11:24  loss: 2.6703 (2.1475)  acc1: 43.7500 (57.4938)  acc5: 68.7500 (77.8623)  time: 0.9043  data: 0.6405  max mem: 5888
Traceback (most recent call last):
  File "train.py", line 392, in <module>
    main(args)
  File "train.py", line 273, in main
    evaluate(model, criterion, data_loader_test, device=device)
  File "train.py", line 62, in evaluate
    for video, target in metric_logger.log_every(data_loader, 100, header):
  File "/private/home/vvryniotis/vision/references/video_classification/utils.py", line 128, in log_every
    for obj in iterable:
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1183, in _next_data
    return self._process_data(data)
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/private/home/vvryniotis/vision/torchvision/datasets/kinetics.py", line 233, in __getitem__
    video, audio, info, video_idx = self.video_clips.get_clip(idx)
  File "/private/home/vvryniotis/vision/torchvision/datasets/video_utils.py", line 362, in get_clip
    assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
AssertionError: torch.Size([17, 288, 352, 3]) x 16

If we apply the following patch:

$ git diff
diff --git a/torchvision/datasets/video_utils.py b/torchvision/datasets/video_utils.py
index f0f19e33..2254f8c5 100644
--- a/torchvision/datasets/video_utils.py
+++ b/torchvision/datasets/video_utils.py
@@ -359,8 +359,8 @@ class VideoClips:
                 resampling_idx = resampling_idx - resampling_idx[0]
             video = video[resampling_idx]
             info["video_fps"] = self.frame_rate
-        assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
-        return video, audio, info, video_idx
+        #assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
+        return video[:self.num_frames], audio[:self.num_frames], info, video_idx
 
     def __getstate__(self):
         video_pts_sizes = [len(v) for v in self.video_pts]

We get an accuracy which is far from the expected one:

Result:
 * Clip Acc@1 56.488 Clip Acc@5 77.773

Expected:
 * Clip Acc@1 57.50 Clip Acc@5 78.81

Questions:

cc @pmeier @fmassa @bjuncek

Versions

Latest main 0817f7f

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions