-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Closed
Labels
Description
🐛 Bug
Calling read_video() on a video file returns audio tensor with shape[1] as 0, when start_pts and end_pts is passed with sec as unit.
To Reproduce
With pts as unit:
visual, audio, info = read_video(video_path, start_pts=10010, end_pts=15015, pts_unit='pts')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)
write_video(
'foo.mp4', video_array=visual, fps=info['video_fps'], audio_array=audio, audio_fps=info['audio_fps'],
audio_codec='aac')
Output:
Visual: torch.Size([6, 256, 340, 3]) Audio: torch.Size([1, 5192]) {'video_fps': 29.97002997002997, 'audio_fps': 48000}
With sec as unit:
visual, audio, info = read_video(video_path, start_pts=0.3337, end_pts=0.5005, pts_unit='sec')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)
write_video(
'bar.mp4', video_array=visual, fps=info['video_fps'], audio_array=audio, audio_fps=info['audio_fps'],
audio_codec='aac')
Output:
Visual: torch.Size([6, 256, 340, 3]) Audio: torch.Size([1, 0]) {'video_fps': 29.97002997002997, 'audio_fps': 48000}
Expected behavior
Similar audio output as returned with pts unit.
Environment
PyTorch version: 1.9.0.dev20210429
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 11.3 (x86_64)
GCC version: Could not collect
Clang version: 12.0.5 (clang-1205.0.22.9)
CMake version: version 3.19.6
Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0.dev20210429
[pip3] torchvision==0.10.0a0+730c5e1
[conda] Could not collect
Additional context
cc @bjuncek