-
Couldn't load subscription status.
- Fork 734
Implement lfilter core loop in C++ #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…y checks on input
|
x640 improvement sounds super impressive 🙂 |
…on only supports float32 at the moment
|
@mthrok wait until we vectorize it! or maybe even parallelize! but don't get your hopes up given how sequential this algorithm is in nature. |
torchaudio/functional/filtering.py
Outdated
| o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1) | ||
| padded_output_waveform[:, i_sample + n_order - 1] = o0 | ||
|
|
||
| torch.ops.torchaudio._lfilter_core_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: you'll also need to guard on device (this is CPU only) and fallback to the for-loop for non-CPU.
torchaudio/functional/filtering.py
Outdated
| o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1) | ||
| padded_output_waveform[:, i_sample + n_order - 1] = o0 | ||
|
|
||
| if input_signal_windows.device.type=='cpu' and\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also use the device object directly for comparisons instead of using the type field.
… script with CPU dispatch-key specific registration
…e on 'nt' and resolving linter issues.
torchaudio/functional/filtering.py
Outdated
| :, i_sample:i_sample + n_order | ||
| ] | ||
| o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1) | ||
| padded_output_waveform[:, i_sample + n_order - 1] = o0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the content of else clause same as the lfilter_core_loop function defined at line 15? If so, can we reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we cannot reuse it, can we leave a comment on why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we cannot reuse it, can we leave a comment on why?
Unfortunately, we cannot reuse it in else clause because depending on module availability the lfilter_core_loop may point to C++ operator in which case we cannot use it if the tensors are not on CPU. The reason we need to follow this complex logic is combination of following reasons:
- C++ operator is not available on Windows
- Try/except block cannot be inside function as TS would complain
- lfilter_core_loop cannot be initialized with None (in which case we could have simply added this non-nullability check on if clause) because TS cannot infer type and would complain during compilation.
- An alternative solution would be to initialize lfilter_core_loop with any 'type' allowed by TS and add check on if statement that lfilter_core_loop is not same as what it was initialized with. In this case we could have avoided code repetition. But initialization with arbitrary type might have created confusion so I avoided it :).
Is the content of
elseclause same as thelfilter_core_loopfunction defined at line 15? If so, can we reuse it?
Yes, the content of else clause is same as lfilter_core_loop function defined at line 15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try something like this?
def lfilter_core_generic_loop(
input_signal_windows: Tensor,
a_coeffs_flipped: Tensor,
padded_output_waveform: Tensor,
):
n_order = a_coeffs_flipped.size(0)
for i_sample, o0 in enumerate(input_signal_windows.t()):
windowed_output_signal = padded_output_waveform[
:, i_sample:i_sample + n_order
]
o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
padded_output_waveform[:, i_sample + n_order - 1] = o0
try:
lfilter_core_cpu_loop = torch.ops.torchaudio._lfilter_core_loop
except RuntimeException:
lfilter_core_cpu_loop = lfilter_core_generic_loopthen inside of lfilter
def lfilter(...)
...
if input_signal_windows.device == torch.device('cpu') and\
a_coeffs_flipped.device == torch.device('cpu') and\
padded_output_waveform.device == torch.device('cpu'):
lfilter_core_cpu_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)
else:
lfilter_core_generic_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mthrok for the suggestion. This makes sense. Let me commit the new changes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good. One question about the reusability of Python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Can you do a final style touch up?
torchaudio/functional/filtering.py
Outdated
| @@ -1,4 +1,5 @@ | |||
| import math | |||
| import os | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not used, and since we landed #1214, this will cause style check to fail if landed on master. Can you remove and merge/rebase the latest master?
torchaudio/functional/filtering.py
Outdated
| import torchaudio._internal.fft | ||
|
|
||
|
|
||
| def lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you prefix this with underscore _ and move the definition and the following lfilter_core_cpu_loop assignment closer to def lfilter?
Also please prefix lfilter_core_cpu_loop with underscore _ as well.
* Fix TF32 convergence issue with TF32 * save
#1238
Bare minimum Implementation of lfilter core loop in C++
Performance measures:
Input waveform size: [2,441000]
Python loop: 6.3959 sec
audio/torchaudio/functional/filtering.py
Line 880 in 2c8aad9
C++ loop: .0101 sec https://github.com/parmeet/audio/blob/cea430ad225b627f3e5e511010e9ad80fcfaa991/torchaudio/functional/filtering.py#L881
Notes:
Several tests in following test suit are failing when using CPU specific kernel registration:
test/torchaudio_unittest/functional/torchscript_consistency_cpu_test.py
Current finding: The scripted function is skipping the call to implemented C++ operator. Hence the output of scripted and non-scripted versions is always different.
As a current workaround:
We registered the kernel using "catch-all" mechanism.