-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Closed
Description
🐛 Describe the bug
Running on main:
torchrun --nproc_per_node=8 train.py --data-path /datasets01/kinetics/070618/400/ --train-dir=val --val-dir=val --batch-size=16 --sync-bn --test-only --pretrained --cache-dataset
throws the following error:
Test: [2200/3008] eta: 0:11:24 loss: 2.6703 (2.1475) acc1: 43.7500 (57.4938) acc5: 68.7500 (77.8623) time: 0.9043 data: 0.6405 max mem: 5888
Traceback (most recent call last):
File "train.py", line 392, in <module>
main(args)
File "train.py", line 273, in main
evaluate(model, criterion, data_loader_test, device=device)
File "train.py", line 62, in evaluate
for video, target in metric_logger.log_every(data_loader, 100, header):
File "/private/home/vvryniotis/vision/references/video_classification/utils.py", line 128, in log_every
for obj in iterable:
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1183, in _next_data
return self._process_data(data)
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/private/home/vvryniotis/vision/torchvision/datasets/kinetics.py", line 233, in __getitem__
video, audio, info, video_idx = self.video_clips.get_clip(idx)
File "/private/home/vvryniotis/vision/torchvision/datasets/video_utils.py", line 362, in get_clip
assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
AssertionError: torch.Size([17, 288, 352, 3]) x 16
If we apply the following patch:
$ git diff
diff --git a/torchvision/datasets/video_utils.py b/torchvision/datasets/video_utils.py
index f0f19e33..2254f8c5 100644
--- a/torchvision/datasets/video_utils.py
+++ b/torchvision/datasets/video_utils.py
@@ -359,8 +359,8 @@ class VideoClips:
resampling_idx = resampling_idx - resampling_idx[0]
video = video[resampling_idx]
info["video_fps"] = self.frame_rate
- assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
- return video, audio, info, video_idx
+ #assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
+ return video[:self.num_frames], audio[:self.num_frames], info, video_idx
def __getstate__(self):
video_pts_sizes = [len(v) for v in self.video_pts]
We get an accuracy which is far from the expected one:
Result:
* Clip Acc@1 56.488 Clip Acc@5 77.773
Expected:
* Clip Acc@1 57.50 Clip Acc@5 78.81
Questions:
- Is this the right dataset for validating the
r2plus1d_18model?- Possibly not, we might have used another version of the dataset. See Assertion error during kinetics400 validation #4839 (comment)
- As far as I can see, the assertion always existed. How did the model got trained without triggering it?
- This is due to a recently introduced but from audio-video sync. See Assertion error during kinetics400 validation #4839 (comment)
- Are the accuracy numbers reported on doc correct?
- Unclear, more investigation needed. See Assertion error during kinetics400 validation #4839 (comment)
Versions
Latest main 0817f7f