[torchvision video reader]inception commit #1279

stephenyan1231 · 2019-08-30T06:19:28Z

Implement a C++ video decoder, and refer to it as TorchVision (TV) video reader in the following.

Main features

Decode both video frames and audio waveform in a single pass
Being able to seek to a user-specified timestamp in both video- and audio streams, and decode frames starting from there. Also can take an end timestamp where the decoding should stop.
For video decoding, support to rescale the height/width and specific AVPixelFormat (default: AV_PIX_FMT_RGB24)
For audio decoding, support to resample audio using user-specified sampling rate and channels. User can also specify AVSampleFormat (default: AV_SAMPLE_FMT_FLT)
Support to decode pts only while actual video/audio frame data is skipped. This is useful in the dataset initialization stage where an index of video dataset needs to be built and we only need pts information

APIs

The main API includes

FfmpegDecoder::decodeFile(....): decode frames from a given video file. This is useful for both OOS and FB research projects, where videos reside in file folder.
FfmpegDecoder::decodeMemory(....): decode frames from a given compressed video byte array. This is useful for decoding everstore videos.

Sanity check

No memory leak is detected.

Benchmark

We use several videos from HMDB-51, UCF-101 and Kinetics-400 for benchmarking and unit test. Test videos are listed below.

RATRACE_wave_f_nm_np1_fr_goo_37.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
SchoolRulesHowTheyHelpUs_wave_f_nm_np1_ba_med_0.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
TrumanShow_wave_f_nm_np1_fr_med_26.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
v_SoccerJuggling_g23_c01.avi
- source: ucf101
- video: Xvid MPEG-4
  - fps: 29.97
- audio: N/A
v_SoccerJuggling_g24_c01.avi
- source: ucf101
- video: Xvid MPEG-4
  - fps: 29.97
- audio: N/A
R6llTwEh07w.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 30
- audio: MPEG AAC audio (mp4a)
  - sample rate: 44.1K Hz
SOX5yA1l24A.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 29.97
- audio: MPEG AAC audio (mp4a)
  - sample rate: 48K Hz
WUzgd7C1pWA.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 29.97
- audio: MPEG AAC audio (mp4a)
  - sample rate: 48K Hz

Unit test

we compare the decoding speed between TorchVision video reader and PyAv in the following cases
- decode full video from file / memory
- decode a fixed number of frames (e.g. [4, 8, 16, 32, 64, 128]) at a randomly selected timestamp
we test the feature of rescaling video frames and resampling audio waveforms
we did stress test to iteratively decode videos and ensure no memory leak
we compare decoding results between only pts is needed and both pts and video/audio frames are needed. Ensure the returned pts data are identical. Also compare decoding efficiency to validate decoding is faster when only pts is needed

Results of unit test are attached.

[torchvision video reader unit test.log]

torchvision.video.reader.unit.test.log

Comparison with PyAv

When decoding all video/audio frames in the video, TorchVision video reader is 1.2x - 6x faster depending on the codec and video length
When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), TorchVision video reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128])

stephenyan1231 · 2019-08-30T06:28:43Z

setup.py

+            video_reader_src,
+            include_dirs=[
+                video_reader_src_dir,
+                '/home/zyan3/local/anaconda3/envs/pytorch_py3/include',


@fmassa , I will remove this line.

For header files from ffmpeg, we need to ensure they are installed at default header file search path.

…vision into torchvision_video_reader

…e_margin

* fixed typo * fixed some more typos and grammer

* make shufflenet scriptable * make resnet18 scriptable * set downsample to identity instead of __constants__ api * use __constants__ for downsample instead of identity * import tensor to fix flake * use torch.Tensor type annotation instead of import

fmassa · 2019-09-02T16:29:40Z

Thanks a lot for the PR Zhicheng!

The first thing I need to figure out before we can merge this is how we will be adding ffmpeg as a dependency for torchvision, and if it will be a soft or hard dependency.

A few options:

use ffmpeg from conda-forge
pull the ffmpeg source and compile it together with torchvision
use the packages provided by ffmpeg

Also, what is the version of FFMpeg that we will be relying upon?

Another thing I need to do is to get CI working for Windows and OSX in torchvision, so that we can make sure that this PR compiles and works nicely in the other OS that torchvision supports.

I'll be looking into both the CI and ffmpeg dependency from an OSS perspective.

soumith · 2019-09-03T15:08:33Z

i think it might be a good start to start with (1), i.e. the ffmpeg from conda-source or system package manager (brew install ffmpeg / apt install ffmpeg). Also, by ffmpeg I presume you mean libav?

For binaries, we will figure out how to ship ffmpeg the right way ourselves. Just building ffmpeg from source is not sufficient btw, because you need to build it with codec support, and there are tons of codecs we need to build it with.

fmassa · 2019-09-03T16:54:32Z

@soumith

i think it might be a good start to start with (1), i.e. the ffmpeg from conda-source or system package manager (brew install ffmpeg / apt install ffmpeg).

sounds good, I'll be looking into option (1) first (once I get full CI running).

Also, by ffmpeg I presume you mean libav?

We need the underlying libraries that are composed of ffmpeg (could also be called libav, but it now points to a fork of ffmpeg with different functionality).

For binaries, we will figure out how to ship ffmpeg the right way ourselves. Just building ffmpeg from source is not sufficient btw, because you need to build it with codec support, and there are tons of codecs we need to build it with.

Sounds good

…e_margin

LowikC · 2019-09-06T15:18:33Z

torchvision/csrc/cpu/video_reader/FfmpegStream.cpp

+  }
+}
+
+<<<<<<< Updated upstream


It seems that you forgot some merge conflicts in this file

Hey Lowik, it is fixed in the replacement PR (#1303)

bjuncek

There are a few things to clean up but this looks very promising! Exciting stuff!!!

bjuncek · 2019-09-06T15:51:47Z

torchvision/models/resnet.py


 class BasicBlock(nn.Module):
    expansion = 1
+    __constants__ = ['downsample']


Can you separate model changes in a separate PR to track it more easily?

Correct. I mess up this PR. I will abandon this PR and create a more clean PR.

bjuncek · 2019-09-06T16:00:46Z

torchvision/datasets/video_utils.py

+        audio_timebase = Fraction(0, 1)
+        if "audio_timebase" in info:
+            audio_timebase = info["audio_timebase"]
+            audio_start_pts = pts_convert(


I wonder if it makes sense to keep the global pts as opposed to doing this conversion?
In case we have more than two streams, we'd have to add more of these if clauses at every iteration.

I agree, I think it might be better to use a global metric, like seconds for example

torchvision/csrc/cpu/video_reader/FfmpegStream.cpp

bjuncek · 2019-09-06T17:52:10Z

test/test_video_reader.py

+import collections
+from common_utils import get_tmp_dir
+from fractions import Fraction
+import logging


I haven't seen logging used in torchvision in general. What is the best prActice for this @fmassa?

We don't currently use logging in torchvision, only emit some deprecation warnings at some places.

I'm not yet sure we want to add logging as of now, it might deserve a larger discussion

Ok. Those logging are mostly used for my dev. I will remove them now.

stephenyan1231 · 2019-09-06T18:49:11Z

Abandon this PR.
Please move to the new PR.

#1303

stephenyan1231 commented Aug 30, 2019

View reviewed changes

[torchvision video reader]inception commit

b3f2a6e

stephenyan1231 force-pushed the torchvision_video_reader branch from 4eec101 to b3f2a6e Compare August 31, 2019 04:46

zyan3 and others added 9 commits August 30, 2019 21:55

[torchvision video reader]inception commit

6f74310

Merge branch 'torchvision_video_reader' of github.com:stephenyan1231/…

c08d869

…vision into torchvision_video_reader

Merge branch 'torchvision_video_reader' of github.com:stephenyan1231/…

bd79d16

…vision into torchvision_video_reader

Merge branch 'torchvision_video_reader' of github.com:stephenyan1231/…

d2c8f73

…vision into torchvision_video_reader

update io module to use TorchVision video reader

4dab1eb

[TorchVision video reader] update video_util.py to use new video reader

0a9b063

[TorchVision video reader]change defaults value of argument seek_fram…

73af0db

…e_margin

fixed typo (pytorch#1284)

20a4a42

* fixed typo * fixed some more typos and grammer

fmassa self-requested a review September 2, 2019 15:49

zyan3 added 6 commits September 5, 2019 20:59

[torchvision video reader]inception commit

cbf5f44

[torchvision video reader]inception commit

3a4eb5b

update io module to use TorchVision video reader

5065878

[TorchVision video reader] update video_util.py to use new video reader

38ae90d

[TorchVision video reader]change defaults value of argument seek_fram…

eda70d5

…e_margin

rebase on master

8f931ae

LowikC reviewed Sep 6, 2019

View reviewed changes

bjuncek reviewed Sep 6, 2019

View reviewed changes

stephenyan1231 closed this Sep 6, 2019

stephenyan1231 mentioned this pull request Sep 10, 2019

[video reader] inception commit #1303

Merged

[torchvision video reader]inception commit #1279

[torchvision video reader]inception commit #1279

Uh oh!

Conversation

stephenyan1231 commented Aug 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Main features

APIs

Sanity check

Benchmark

Unit test

Comparison with PyAv

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa commented Sep 2, 2019

Uh oh!

soumith commented Sep 3, 2019

Uh oh!

fmassa commented Sep 3, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjuncek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephenyan1231 commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

stephenyan1231 commented Aug 30, 2019 •

edited

Loading