Issue 764: Convert mp3 to wav #773

engineerchuan · 2020-07-11T22:50:37Z

Convert mp3 to wav or on the fly generation.

This reverts commit a9e89d1.

engineerchuan · 2020-07-12T14:11:44Z

I tried to avoid converting to int16 tensor which saves as 16 bit PCM wave. I tried to keep it as float32 tensor which should save as 32 bit. However I ran into the following error:

formats: raw can't encode Signed Integer PCM to 25-bit

The temp file that was written out had these properties:

$ soxi /tmp/test.wav

Input File     : '/tmp/test.wav'
Channels       : 1
Sample Rate    : 44100
Precision      : 25-bit
Duration       : 00:00:05.00 = 220500 samples = 375 CDDA sectors
File Size      : 882k
Bit Rate       : [1.41M](url)
Sample Encoding: 32-bit Floating Point PCM

engineerchuan · 2020-07-12T15:06:37Z

@mthrok I'm not sure what's wrong with the binary macos conda. It seems to be related to some specification mismatch that my code should not have affected.

mthrok

Looks good for the most part. Left some comments.

test/test_sox_compatibility.py

mthrok · 2020-07-13T01:06:37Z

test/test_sox_compatibility.py

        output_waveform = F.bass_biquad(waveform, sample_rate, gain, central_freq, q)

-        self.assertEqual(output_waveform, sox_output_waveform, atol=1.5e-4, rtol=1e-5)
+        self.assertEqual(output_waveform, sox_output_waveform, atol=1e-3, rtol=1e-4)


This is dues to edge effect, and the trick to avoid reducing tolerance here is to scale the intensity of the generated whitenoise (say x0.9).

Tried this was bass and still got a few errors exceeding 1e-3.

E AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.0001 and atol=0.001, found 136 element(s) (out of 220500) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0013254880905151367 (0.8861521482467651 vs. 0.88482666015625), which occurred at index (0, 47224).

test/test_sox_effects.py

mthrok · 2020-07-13T01:15:46Z

test/test_sox_compatibility.py

+        common_utils.TempDirMixin.setUp(self)
+        common_utils.TorchaudioTestCase.setUp(self)
+
+        NOISE_SAMPLE_RATE = 44100


Can you use 8000 Hz? though 44100 Hz is common in audio, many speech-related tasks only use 16k or 8k Hz and 8k Hz will reduce test run time.

Sox deemph seems to only work on 44.1kHz or 48kHz

deemph: sample rate must be 44100 (audio-CD) or 48000 (DAT)

mthrok · 2020-07-13T01:18:31Z

I tried to avoid converting to int16 tensor which saves as 16 bit PCM wave. I tried to keep it as float32 tensor which should save as 32 bit. However I ran into the following error:
formats: raw can't encode Signed Integer PCM to 25-bit
The temp file that was written out had these properties:
$ soxi /tmp/test.wav

Input File     : '/tmp/test.wav'
Channels       : 1
Sample Rate    : 44100
Precision      : 25-bit
Duration       : 00:00:05.00 = 220500 samples = 375 CDDA sectors
File Size      : 882k
Bit Rate       : [1.41M](url)
Sample Encoding: 32-bit Floating Point PCM

I am not sure where you got formats: raw can't encode Signed Integer PCM to 25-bit error, but the current sox effects does not work correctly for floating point PCM (example), so generating white noise as 16bit signed integer PCM is correct (and probably the only) working way. Once #760 has landed then this can be cleaned up.

mthrok · 2020-07-13T01:19:03Z

@mthrok I'm not sure what's wrong with the binary macos conda. It seems to be related to some specification mismatch that my code should not have affected.

Yes, that happens when PyTorch's binary build fails. You can ignore it.

mthrok · 2020-07-13T01:19:07Z

Thanks!

engineerchuan · 2020-07-13T16:45:55Z

@mthrok Check out the latest. I think this approach captures what you want but it introduces quite a number of instabilities and we would need to relax atol thresholds for 4-5 tests. Why do you think that is? Do you think there's some issue with the float -> int quantization that differs?

…ween functional and sox

codecov · 2020-07-14T11:19:23Z

Codecov Report

Merging #773 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #773   +/-   ##
=======================================
  Coverage   89.53%   89.53%           
=======================================
  Files          32       32           
  Lines        2617     2617           
=======================================
  Hits         2343     2343           
  Misses        274      274

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c375490...a842f2f. Read the comment docs.

mthrok · 2020-07-14T17:15:01Z

Thanks!

Co-authored-by: Aleksandr Panchul (CSI Interfusion Inc) <[email protected]>

engineerchuan added 7 commits July 11, 2020 18:35

switch one test from mp3 to wav

190db96

try switching one file to on the fly generation

6795f01

Switch to on the fly generation for test_sox_compatibility

0babc87

trying again

332015f

Issue 764 - on the fly noise generation

a9e89d1

Revert "Issue 764 - on the fly noise generation"

dfb449f

This reverts commit a9e89d1.

don't do black formatting which obfuscates actual changes

7d67992

engineerchuan added 2 commits July 12, 2020 10:14

fixed flake8 issues

2587b67

relax bass atol and rtol by one order of magnitude

e437cd0

mthrok self-assigned this Jul 13, 2020

mthrok reviewed Jul 13, 2020

View reviewed changes

Responded to comments, tests will not pass

d3aae5c

engineerchuan added 4 commits July 14, 2020 06:41

use super().setUp(), rely on reloaded signal for comparison tests bet…

62194ff

…ween functional and sox

typo, riaa -> deemph

ecdc257

add scale factor into get_whitenoise, revert back test_bass atol

314fc4f

placate flake8

a842f2f

mthrok approved these changes Jul 14, 2020

View reviewed changes

mthrok merged commit d11ad6b into pytorch:master Jul 14, 2020

engineerchuan deleted the issue_764_convert_mp3_to_wav branch July 14, 2020 18:28

mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023

correcting the pipeline rpc example (pytorch#773)

e9b2f8e

Co-authored-by: Aleksandr Panchul (CSI Interfusion Inc) <[email protected]>

Issue 764: Convert mp3 to wav #773

Issue 764: Convert mp3 to wav #773

Uh oh!

Conversation

engineerchuan commented Jul 11, 2020

Uh oh!

engineerchuan commented Jul 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

engineerchuan commented Jul 12, 2020

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mthrok Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

engineerchuan Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mthrok Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

engineerchuan Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

mthrok commented Jul 13, 2020

Uh oh!

mthrok commented Jul 13, 2020

Uh oh!

mthrok commented Jul 13, 2020

Uh oh!

engineerchuan commented Jul 13, 2020

Uh oh!

codecov bot commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mthrok commented Jul 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

engineerchuan commented Jul 12, 2020 •

edited

Loading

codecov bot commented Jul 14, 2020 •

edited

Loading