-
Notifications
You must be signed in to change notification settings - Fork 738
Description
The resampling function in Kaldi that pytorchaudio is currently using has some inefficient for loops and padding steps. I've put together efficient module and evaluated the performance in this notebook:
https://www.kaggle.com/smallyellowduck/fast-audio-resampling-layer-in-pytorch (code is in the the notebook)
edit to make two separate comparisons of the resampling time without the file load time:
Comparison 1: 'kaiser_best' settings in librosa vs 'kaiser_best' setting in the efficient pytorch resampler (should be the same setup)
librosa: 51 s
efficient pytorch resampler: 9 s
Comparison 2: default setting in torchaudio vs window='hann', num_zeros=6 in the efficient pytorch resampler (should be the same set-up)
torchaudio: 10 s
efficient pytorch resampler: 1 s
The performance improvement is most substantial when the input sample rate and output sample rate are not whole number multiple of each other.
I think it would be good for torchaudio to switch to the more efficient resample module.
Before making a PR, perhaps other people have feedback about what the API for the module should look like? I have largely tried to follow the api for the resample method in librosa. Any other additional comments? @vincentqb
def __init__(self,
input_sr, output_sr, dtype,
num_zeros = 64, cutoff_ratio = 0.95, filter='kaiser', beta=14.0):
super().__init__() # init the base class
"""
This creates an object that can apply a symmetric FIR filter
based on torch.nn.functional.conv1d.
Args:
input_sr: The input sampling rate, AS AN INTEGER..
output_sr: The output sampling rate, AS AN INTEGER.
dtype: The torch dtype to use for computations
num_zeros: The number of zeros per side in the (sinc*hanning-window)
filter function. More is more accurate, but 64 is already quite a lot.
cutoff_ratio: The filter rolloff point as a fraction of the Nyquist freq.
filter: one of ['kaiser', 'kaiser_best', 'kaiser_fast', 'hann']
beta: parameter for 'kaiser' filter