Skip to content

Conversation

@mthrok
Copy link
Contributor

@mthrok mthrok commented Sep 13, 2021

Add OpenMP support to build process, so that lfilter is parallelized. (#1557)

cc @yoyololicon

TODO:

  • Benchmark lfilter CPU
    • Plot
  • Add step to verify that the OpenMP version is same as PyTorch.
    >>> print(torch.__config__.parallel_info())
    ATen/Parallel:
        at::get_num_threads() : 8
        at::get_num_interop_threads() : 8
    OpenMP 201511 (a.k.a. OpenMP 4.5)
        omp_get_max_threads() : 8
    Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
        mkl_get_max_threads() : 8
    Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
    std::thread::hardware_concurrency() : 16
    Environment variables:
        OMP_NUM_THREADS : [not set]
        MKL_NUM_THREADS : [not set]
    ATen parallel backend: OpenMP
    
  • (after merge) Check that nightly build libtorchaudio points the same OpenMP library as PyTorch.

Benchmark

Script is from #1441 (comment)

$ USE_OPENMP=0 python setup.py clean develop && python benchmark.py

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    275.2  |    696.4
      [32, 1024]   |    651.2  |   1332.2
      [32, 4096]   |   1949.0  |   4122.1
      [64, 256]    |    418.9  |    934.1
      [64, 1024]   |   1135.0  |   2210.8
      [64, 4096]   |   3979.2  |   7947.9
      [128, 256]   |    658.7  |   1335.7
      [128, 1024]  |   1952.1  |   3933.4
      [128, 4096]  |  25779.2  |  35013.4
2 threads: -------------------------------
      [32, 256]    |    255.0  |    700.7
      [32, 1024]   |    559.6  |   1316.8
      [32, 4096]   |   1524.7  |   3221.2
      [64, 256]    |    361.4  |    918.5
      [64, 1024]   |    914.5  |   1813.1
      [64, 4096]   |   2854.4  |   6060.5
      [128, 256]   |    563.4  |   1297.8
      [128, 1024]  |   1529.0  |   3047.7
      [128, 4096]  |  15970.5  |  22613.1
4 threads: -------------------------------
      [32, 256]    |    237.8  |    696.3
      [32, 1024]   |    500.5  |   1281.7
      [32, 4096]   |   1307.5  |   2879.6
      [64, 256]    |    329.2  |    888.1
      [64, 1024]   |    815.5  |   1784.2
      [64, 4096]   |   2286.0  |   5158.9
      [128, 256]   |    502.5  |   1262.4
      [128, 1024]  |   1304.5  |   2707.9
      [128, 4096]  |  10360.7  |  15790.5
$ USE_OPENMP=1 python setup.py clean develop && python benchmark.py

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    279.8  |    703.5
      [32, 1024]   |    652.5  |   1324.0
      [32, 4096]   |   1945.7  |   3922.1
      [64, 256]    |    417.6  |    938.1
      [64, 1024]   |   1133.3  |   2203.4
      [64, 4096]   |   3819.4  |   7863.2
      [128, 256]   |    681.2  |   1368.6
      [128, 1024]  |   1967.8  |   3962.5
      [128, 4096]  |  27461.1  |  35519.0
2 threads: -------------------------------
      [32, 256]    |    229.9  |    644.8
      [32, 1024]   |    444.4  |   1102.4
      [32, 4096]   |   1164.6  |   2434.2
      [64, 256]    |    305.3  |    814.8
      [64, 1024]   |    687.1  |   1509.4
      [64, 4096]   |   2117.4  |   4561.1
      [128, 256]   |    458.7  |   1094.7
      [128, 1024]  |   1160.6  |   2314.5
      [128, 4096]  |  14750.9  |  19707.2
4 threads: -------------------------------
      [32, 256]    |    196.9  |    608.9
      [32, 1024]   |    329.5  |    949.3
      [32, 4096]   |    674.4  |   1642.6
      [64, 256]    |    243.3  |    733.9
      [64, 1024]   |    454.2  |   1207.1
      [64, 4096]   |   1191.5  |   2841.3
      [128, 256]   |    333.4  |    937.6
      [128, 1024]  |    685.7  |   1570.5
      [128, 4096]  |   8411.4  |  11559.6

_BUILD_RNNT = _get_build("BUILD_RNNT", True)
_USE_ROCM = _get_build("USE_ROCM", torch.cuda.is_available() and torch.version.hip is not None)
_USE_CUDA = _get_build("USE_CUDA", torch.cuda.is_available() and torch.version.hip is None)
_USE_OPENMP = _get_build("USE_OPENMP", True) and 'ATen parallel backend: OpenMP' in torch.__config__.parallel_info()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: double check this the later condition is necessary. It might work without it in macOS. (but not sure if that's okay. Consult someone from PyTorch core)

@mthrok mthrok requested a review from malfet September 17, 2021 19:57
@mthrok mthrok marked this pull request as ready for review September 17, 2021 20:40
@mthrok mthrok force-pushed the openmp branch 3 times, most recently from c044ceb to fd3263b Compare October 6, 2021 02:11
@mthrok mthrok merged commit e3734fe into pytorch:main Oct 6, 2021
@mthrok mthrok deleted the openmp branch October 6, 2021 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants