Implement lfilter core loop in C++ #1244

parmeet · 2021-02-06T01:35:25Z

Bare minimum Implementation of lfilter core loop in C++

Performance measures:

Input waveform size: [2,441000]
Python loop: 6.3959 sec

audio/torchaudio/functional/filtering.py

Line 880 in 2c8aad9

for i_sample, o0 in enumerate(input_signal_windows.t()):

C++ loop: .0101 sec https://github.com/parmeet/audio/blob/cea430ad225b627f3e5e511010e9ad80fcfaa991/torchaudio/functional/filtering.py#L881

Notes:

Several tests in following test suit are failing when using CPU specific kernel registration:
test/torchaudio_unittest/functional/torchscript_consistency_cpu_test.py
Current finding: The scripted function is skipping the call to implemented C++ operator. Hence the output of scripted and non-scripted versions is always different.

As a current workaround:
We registered the kernel using "catch-all" mechanism.

…y checks on input

mthrok · 2021-02-06T01:54:22Z

x640 improvement sounds super impressive 🙂

…on only supports float32 at the moment

cpuhrsch · 2021-02-08T23:04:57Z

@mthrok wait until we vectorize it! or maybe even parallelize! but don't get your hopes up given how sequential this algorithm is in nature.

cpuhrsch · 2021-02-09T17:35:43Z

torchaudio/functional/filtering.py

-        o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
-        padded_output_waveform[:, i_sample + n_order - 1] = o0
+
+    torch.ops.torchaudio._lfilter_core_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)


Note: you'll also need to guard on device (this is CPU only) and fallback to the for-loop for non-CPU.

cpuhrsch · 2021-02-09T22:47:51Z

torchaudio/functional/filtering.py

-        o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
-        padded_output_waveform[:, i_sample + n_order - 1] = o0
+
+    if input_signal_windows.device.type=='cpu' and\


You can also use the device object directly for comparisons instead of using the type field.

… script with CPU dispatch-key specific registration

… on registration

…e on 'nt' and resolving linter issues.

mthrok · 2021-02-12T03:49:03Z

torchaudio/functional/filtering.py

+                :, i_sample:i_sample + n_order
+            ]
+            o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1)
+            padded_output_waveform[:, i_sample + n_order - 1] = o0


Is the content of else clause same as the lfilter_core_loop function defined at line 15? If so, can we reuse it?

and if we cannot reuse it, can we leave a comment on why?

and if we cannot reuse it, can we leave a comment on why?

Unfortunately, we cannot reuse it in else clause because depending on module availability the lfilter_core_loop may point to C++ operator in which case we cannot use it if the tensors are not on CPU. The reason we need to follow this complex logic is combination of following reasons:

C++ operator is not available on Windows

Try/except block cannot be inside function as TS would complain

lfilter_core_loop cannot be initialized with None (in which case we could have simply added this non-nullability check on if clause) because TS cannot infer type and would complain during compilation.

An alternative solution would be to initialize lfilter_core_loop with any 'type' allowed by TS and add check on if statement that lfilter_core_loop is not same as what it was initialized with. In this case we could have avoided code repetition. But initialization with arbitrary type might have created confusion so I avoided it :).

Is the content of else clause same as the lfilter_core_loop function defined at line 15? If so, can we reuse it?

Yes, the content of else clause is same as lfilter_core_loop function defined at line 15.

Did you try something like this?

def lfilter_core_generic_loop( input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor, ): n_order = a_coeffs_flipped.size(0) for i_sample, o0 in enumerate(input_signal_windows.t()): windowed_output_signal = padded_output_waveform[ :, i_sample:i_sample + n_order ] o0.addmv_(windowed_output_signal, a_coeffs_flipped, alpha=-1) padded_output_waveform[:, i_sample + n_order - 1] = o0 try: lfilter_core_cpu_loop = torch.ops.torchaudio._lfilter_core_loop except RuntimeException: lfilter_core_cpu_loop = lfilter_core_generic_loop

then inside of lfilter

def lfilter(...) ... if input_signal_windows.device == torch.device('cpu') and\ a_coeffs_flipped.device == torch.device('cpu') and\ padded_output_waveform.device == torch.device('cpu'): lfilter_core_cpu_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform) else: lfilter_core_generic_loop(input_signal_windows, a_coeffs_flipped, padded_output_waveform)

Thank you @mthrok for the suggestion. This makes sense. Let me commit the new changes accordingly.

mthrok

Looks mostly good. One question about the reusability of Python code.

mthrok

Looks good. Can you do a final style touch up?

mthrok · 2021-02-12T15:51:45Z

torchaudio/functional/filtering.py

@@ -1,4 +1,5 @@
 import math
+import os


I think this is not used, and since we landed #1214, this will cause style check to fail if landed on master. Can you remove and merge/rebase the latest master?

mthrok · 2021-02-12T15:53:00Z

torchaudio/functional/filtering.py

 import torchaudio._internal.fft


+def lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):


Can you prefix this with underscore _ and move the definition and the following lfilter_core_cpu_loop assignment closer to def lfilter?

Also please prefix lfilter_core_cpu_loop with underscore _ as well.

* Fix TF32 convergence issue with TF32 * save

Bare minimum Implementation of core lfilter loop without any necessar…

cea430a

…y checks on input

facebook-github-bot added the CLA Signed label Feb 6, 2021

removed unnecessary headers

13043fe

parmeet added 3 commits February 8, 2021 08:16

Implemented various checks on input

6d2fbf8

Calling existing implementation for float64 types as C++ implementati…

0ea4f03

…on only supports float32 at the moment

moving lfilter.cpp to LIBTORCHAUDIO_SOURCES

14f8675

templated C++ operator to support both float32 and float64 types

fc9617d

cpuhrsch reviewed Feb 9, 2021

View reviewed changes

guarding core loop operator call for cpu device only

7744398

cpuhrsch reviewed Feb 9, 2021

View reviewed changes

parmeet added 5 commits February 9, 2021 19:35

comparing host device directly via device object

2f7cc09

Registering operator using "catch-all" kernel as Torchscript fails to…

c6b44ed

… script with CPU dispatch-key specific registration

fixed Linter issues in filtering.py and added comment on binding code…

2d693a6

… on registration

gaurding operator call based on OS as C++ extensions are not availabl…

08aae5e

…e on 'nt' and resolving linter issues.

implementing fallback solution when operator is not available.

42bcb11

cpuhrsch approved these changes Feb 11, 2021

View reviewed changes

cpuhrsch requested a review from mthrok February 11, 2021 23:48

cpuhrsch changed the title ~~[WIP] Implement lfilter core loop in C++~~ Implement lfilter core loop in C++ Feb 11, 2021

mthrok reviewed Feb 12, 2021

View reviewed changes

avoiding code repitition of lfilter_core_loop

fab80f3

parmeet requested a review from mthrok February 12, 2021 15:24

mthrok approved these changes Feb 12, 2021

View reviewed changes

parmeet and others added 2 commits February 12, 2021 10:25

fixing style related issues

3ee854c

Merge branch 'master' into lfilter

18a43cf

mthrok merged commit 05bff83 into pytorch:master Feb 12, 2021

parmeet deleted the lfilter branch February 12, 2021 23:47

mthrok mentioned this pull request Feb 14, 2021

CPP extension for overdrive effect in functional #580

Closed

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021

Fix TF32 convergence issue with TF32 (pytorch#1244)

9ecc44a

* Fix TF32 convergence issue with TF32 * save

		import torchaudio._internal.fft


		def lfilter_core_generic_loop(input_signal_windows: Tensor, a_coeffs_flipped: Tensor, padded_output_waveform: Tensor):

Uh oh!

Implement lfilter core loop in C++ #1244

Implement lfilter core loop in C++ #1244

Uh oh!

Conversation

parmeet commented Feb 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance measures:

Notes:

Uh oh!

mthrok commented Feb 6, 2021

Uh oh!

cpuhrsch commented Feb 8, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parmeet Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

parmeet commented Feb 6, 2021 •

edited

Loading

parmeet Feb 12, 2021 •

edited

Loading