Skip to content

Conversation

@engineerchuan
Copy link
Contributor

Convert mp3 to wav or on the fly generation.

@engineerchuan
Copy link
Contributor Author

engineerchuan commented Jul 12, 2020

I tried to avoid converting to int16 tensor which saves as 16 bit PCM wave. I tried to keep it as float32 tensor which should save as 32 bit. However I ran into the following error:

formats: raw can't encode Signed Integer PCM to 25-bit

The temp file that was written out had these properties:

$ soxi /tmp/test.wav

Input File     : '/tmp/test.wav'
Channels       : 1
Sample Rate    : 44100
Precision      : 25-bit
Duration       : 00:00:05.00 = 220500 samples = 375 CDDA sectors
File Size      : 882k
Bit Rate       : [1.41M](url)
Sample Encoding: 32-bit Floating Point PCM

@engineerchuan
Copy link
Contributor Author

@mthrok I'm not sure what's wrong with the binary macos conda. It seems to be related to some specification mismatch that my code should not have affected.

@mthrok mthrok self-assigned this Jul 13, 2020
Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. Left some comments.

output_waveform = F.bass_biquad(waveform, sample_rate, gain, central_freq, q)

self.assertEqual(output_waveform, sox_output_waveform, atol=1.5e-4, rtol=1e-5)
self.assertEqual(output_waveform, sox_output_waveform, atol=1e-3, rtol=1e-4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dues to edge effect, and the trick to avoid reducing tolerance here is to scale the intensity of the generated whitenoise (say x0.9).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this was bass and still got a few errors exceeding 1e-3.

E   AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.0001 and atol=0.001, found 136 element(s) (out of 220500) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0013254880905151367 (0.8861521482467651 vs. 0.88482666015625), which occurred at index (0, 47224).

common_utils.TempDirMixin.setUp(self)
common_utils.TorchaudioTestCase.setUp(self)

NOISE_SAMPLE_RATE = 44100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use 8000 Hz? though 44100 Hz is common in audio, many speech-related tasks only use 16k or 8k Hz and 8k Hz will reduce test run time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sox deemph seems to only work on 44.1kHz or 48kHz

deemph: sample rate must be 44100 (audio-CD) or 48000 (DAT)

@mthrok
Copy link
Contributor

mthrok commented Jul 13, 2020

I tried to avoid converting to int16 tensor which saves as 16 bit PCM wave. I tried to keep it as float32 tensor which should save as 32 bit. However I ran into the following error:

formats: raw can't encode Signed Integer PCM to 25-bit

The temp file that was written out had these properties:

$ soxi /tmp/test.wav

Input File     : '/tmp/test.wav'
Channels       : 1
Sample Rate    : 44100
Precision      : 25-bit
Duration       : 00:00:05.00 = 220500 samples = 375 CDDA sectors
File Size      : 882k
Bit Rate       : [1.41M](url)
Sample Encoding: 32-bit Floating Point PCM

I am not sure where you got formats: raw can't encode Signed Integer PCM to 25-bit error, but the current sox effects does not work correctly for floating point PCM (example), so generating white noise as 16bit signed integer PCM is correct (and probably the only) working way. Once #760 has landed then this can be cleaned up.

@mthrok
Copy link
Contributor

mthrok commented Jul 13, 2020

@mthrok I'm not sure what's wrong with the binary macos conda. It seems to be related to some specification mismatch that my code should not have affected.

Yes, that happens when PyTorch's binary build fails. You can ignore it.

@mthrok
Copy link
Contributor

mthrok commented Jul 13, 2020

Thanks!

@engineerchuan
Copy link
Contributor Author

@mthrok Check out the latest. I think this approach captures what you want but it introduces quite a number of instabilities and we would need to relax atol thresholds for 4-5 tests. Why do you think that is? Do you think there's some issue with the float -> int quantization that differs?

@codecov
Copy link

codecov bot commented Jul 14, 2020

Codecov Report

Merging #773 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #773   +/-   ##
=======================================
  Coverage   89.53%   89.53%           
=======================================
  Files          32       32           
  Lines        2617     2617           
=======================================
  Hits         2343     2343           
  Misses        274      274           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c375490...a842f2f. Read the comment docs.

@mthrok mthrok merged commit d11ad6b into pytorch:master Jul 14, 2020
@mthrok
Copy link
Contributor

mthrok commented Jul 14, 2020

Thanks!

@engineerchuan engineerchuan deleted the issue_764_convert_mp3_to_wav branch July 14, 2020 18:28
mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
Co-authored-by: Aleksandr Panchul (CSI Interfusion Inc) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants