Skip to content

Conversation

@parmeet
Copy link
Contributor

@parmeet parmeet commented Feb 6, 2021

#1238

Bare minimum Implementation of lfilter core loop in C++

Performance measures:

Input waveform size: [2,441000]
Python loop: 6.3959 sec

for i_sample, o0 in enumerate(input_signal_windows.t()):

C++ loop: .0101 sec https://github.com/parmeet/audio/blob/cea430ad225b627f3e5e511010e9ad80fcfaa991/torchaudio/functional/filtering.py#L881

Notes:

Several tests in following test suit are failing when using CPU specific kernel registration:
test/torchaudio_unittest/functional/torchscript_consistency_cpu_test.py
Current finding: The scripted function is skipping the call to implemented C++ operator. Hence the output of scripted and non-scripted versions is always different.

As a current workaround:
We registered the kernel using "catch-all" mechanism.

@mthrok
Copy link
Contributor

mthrok commented Feb 6, 2021

x640 improvement sounds super impressive 🙂

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 8, 2021

@mthrok wait until we vectorize it! or maybe even parallelize! but don't get your hopes up given how sequential this algorithm is in nature.

o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
padded_output_waveform[:, i_sample + n_order - 1] = o0

torch.ops.torchaudio._lfilter_core_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you'll also need to guard on device (this is CPU only) and fallback to the for-loop for non-CPU.

o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
padded_output_waveform[:, i_sample + n_order - 1] = o0

if input_signal_windows.device.type=='cpu' and\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use the device object directly for comparisons instead of using the type field.

@cpuhrsch cpuhrsch requested a review from mthrok February 11, 2021 23:48
@cpuhrsch cpuhrsch changed the title [WIP] Implement lfilter core loop in C++ Implement lfilter core loop in C++ Feb 11, 2021
:, i_sample:i_sample + n_order
]
o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
padded_output_waveform[:, i_sample + n_order - 1] = o0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the content of else clause same as the lfilter_core_loop function defined at line 15? If so, can we reuse it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we cannot reuse it, can we leave a comment on why?

Copy link
Contributor Author

@parmeet parmeet Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we cannot reuse it, can we leave a comment on why?

Unfortunately, we cannot reuse it in else clause because depending on module availability the lfilter_core_loop may point to C++ operator in which case we cannot use it if the tensors are not on CPU. The reason we need to follow this complex logic is combination of following reasons:

  • C++ operator is not available on Windows
  • Try/except block cannot be inside function as TS would complain
  • lfilter_core_loop cannot be initialized with None (in which case we could have simply added this non-nullability check on if clause) because TS cannot infer type and would complain during compilation.
  • An alternative solution would be to initialize lfilter_core_loop with any 'type' allowed by TS and add check on if statement that lfilter_core_loop is not same as what it was initialized with. In this case we could have avoided code repetition. But initialization with arbitrary type might have created confusion so I avoided it :).

Is the content of else clause same as the lfilter_core_loop function defined at line 15? If so, can we reuse it?

Yes, the content of else clause is same as lfilter_core_loop function defined at line 15.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try something like this?

def lfilter_core_generic_loop(
        input_signal_windows: Tensor,
        a_coeffs_flipped: Tensor,
        padded_output_waveform: Tensor,
):
    n_order = a_coeffs_flipped.size(0)
    for i_sample, o0 in enumerate(input_signal_windows.t()):
        windowed_output_signal = padded_output_waveform[
            :, i_sample:i_sample + n_order
        ]
        o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
        padded_output_waveform[:, i_sample + n_order - 1] = o0

try:
    lfilter_core_cpu_loop = torch.ops.torchaudio._lfilter_core_loop
except RuntimeException:
    lfilter_core_cpu_loop = lfilter_core_generic_loop

then inside of lfilter

def lfilter(...)
    ...
    if input_signal_windows.device == torch.device('cpu') and\
       a_coeffs_flipped.device == torch.device('cpu') and\
       padded_output_waveform.device == torch.device('cpu'):
        lfilter_core_cpu_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)
    else:
        lfilter_core_generic_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mthrok for the suggestion. This makes sense. Let me commit the new changes accordingly.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. One question about the reusability of Python code.

@parmeet parmeet requested a review from mthrok February 12, 2021 15:24
Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you do a final style touch up?

@@ -1,4 +1,5 @@
import math
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not used, and since we landed #1214, this will cause style check to fail if landed on master. Can you remove and merge/rebase the latest master?

import torchaudio._internal.fft


def lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you prefix this with underscore _ and move the definition and the following lfilter_core_cpu_loop assignment closer to def lfilter?

Also please prefix lfilter_core_cpu_loop with underscore _ as well.

@mthrok mthrok merged commit 05bff83 into pytorch:master Feb 12, 2021
@parmeet parmeet deleted the lfilter branch February 12, 2021 23:47
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
* Fix TF32 convergence issue with TF32

* save
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants