Skip to content

Conversation

@yoyolicoris
Copy link
Contributor

@yoyolicoris yoyolicoris commented Jun 8, 2021

Resolves #1476, resolves #1528.

This implementation doensn't rely on multithreading library or custom cuda kernel, but use only python/C++ API of PyTorch.

After this change, a_coeffs and b_coeffs can be 1D or 2D Tensor, where the optional dimension means number of filters and should be broadcastable to waveform.shape[:-1].

Will do some benchmarks later.

@mthrok
Copy link
Contributor

mthrok commented Jun 9, 2021

Hi @yoyololicon

Thanks for the PR. This is another wonderful addition. Can you add a batch consistency test?
Questions about batch behavior

  • What happens to samples that are padded for the sake of batching? Will the padding value have an impact on the result?
  • Is it easy to compute the valid length of padded signal?

@yoyolicoris
Copy link
Contributor Author

yoyolicoris commented Jun 9, 2021

Hi @yoyololicon

Thanks for the PR. This is another wonderful addition. Can you add a batch consistency test?

Sure.

Questions about batch behavior

  • What happens to samples that are padded for the sake of batching? Will the padding value have an impact on the result?
  • Is it easy to compute the valid length of padded signal?

I'm not sure whether the padding operation you are talking about is the behavior I call broadcasting. As what I know, no extra values are padded to the signal.
I can give some examples of the behavior of filter from this merge:

  1. two filters, one signal
>>> x = torch.randn(44100)
>>> a_coeffs = torch.rand(2, 3)
>>> b_coeffs = torch.rand(2, 3)
>>> F.lfilter(x, a_coeffs, b_coeffs).shape
(2, 44100)

The signal has been filtered by a set of filters, corresponding to the feature proposed in #1528 .

  1. two filters, two signals
>>> x = torch.randn(2, 44100)
>>> a_coeffs = torch.rand(2, 3)
>>> b_coeffs = torch.rand(2, 3)
>>> F.lfilter(x, a_coeffs, b_coeffs).shape
(2, 44100)

In this example, each signal is filtered by its own set of filter coefficients, corresponding to the batching feature proposed in #1476.

  1. two filters, multiple signals
>>> x = torch.randn(10, 1, 44100)
>>> a_coeffs = torch.rand(2, 3)
>>> b_coeffs = torch.rand(2, 3)
>>> F.lfilter(x, a_coeffs, b_coeffs).shape
(10, 2, 44100)

Batches of signals are filtered by a set of filters.

  1. two filters, batches of stereo signals
>>> x = torch.randn(10, 2, 44100)
>>> a_coeffs = torch.rand(2, 3)
>>> b_coeffs = torch.rand(2, 3)
>>> F.lfilter(x, a_coeffs, b_coeffs).shape
(10, 2, 44100)

In this example, each channel of the signals is filtered by different coefficients.

I might benchmark the filter speed later this weekend cuz my machine is running some training these days.
Because the generic_lfilter_core_loop has also been modified to support batching, we also need to do benchmarks on CUDA as well.

@mthrok
Copy link
Contributor

mthrok commented Jun 9, 2021

@yoyololicon

By padding, I mean when I make a batch from signals with different number of samples.
Say, I make a batch from N and N/2 frame signals, I have to pad the end of the second signal.

Batch Input
- Signal 1: |xxxxxxxxxxxxxxxx|
- Signal 2: |xxxxxxxx--------|
Batch Result
- Signal 1: |yyyyyyyyyyyy|
- Signal 2: |yyyyyy------| <-- What is the valid length of resulting signal2?
                 ^^  <- Is the padding at the end have some effect on the result?

In this case, to get the resulting filtered signal2, I need to do the slicing and to do that I need to know the length of the resulting signal. i.e. batch[1][:valid_length]. This padding then masking operation happens commonly in RNNs and Attention mechanism, so I wanted to learn if something similar has to be in consideration.

However, looking at the examples you showed, lfilter seems to always return the same length of signals, so if user code knows the valid length, then it's easy to get the filtered signal. Is that correct?

@yoyolicoris
Copy link
Contributor Author

yoyolicoris commented Jun 9, 2021

@yoyololicon

By padding, I mean when I make a batch from signals with different number of samples.
Say, I make a batch from N and N/2 frame signals, I have to pad the end of the second signal.
In this case, to get the resulting filtered signal2, I need to do the slicing and to do that I need to know the length of the resulting signal. i.e. batch[1][:valid_length]. This padding then masking operation happens commonly in RNNs and Attention mechanism, so I wanted to learn if something similar has to be in consideration.

@mthrok
I see, thanks for the thorough explaination. Because lfilter is a causal system along time, you can just take the first N / 2 output values of signal 2. The remaining values have no impact on the result, in the above case.

However, looking at the examples you showed, lfilter seems to always return the same length of signals, so if user code knows the valid length, then it's easy to get the filtered signal. Is that correct?

According to above explaination, you are right.

@mthrok
Copy link
Contributor

mthrok commented Jun 9, 2021

@yoyololicon

Thanks for the confirmation. Looking at the tests, I think the PR looks okay behavior-wise. Let me look into the code detail soon.

for i in range(self.batch_size)
])

self.assertEqual(batchwise_output, itemwise_output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use self.assert_batch_consistency helper method? It handles dtype/device as well.

Copy link
Contributor Author

@yoyolicoris yoyolicoris Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assert_batch_consistency seems assume it only needs to take batch on the first input, but in our case, a_coeffs and b_coeffs should also be in batch as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean it is applying different filters to samples in batch? I thought the same set of filters are applied to each sample in batch, so one can change the batch size without changing a_coeffs and b_coeffs.

Copy link
Contributor Author

@yoyolicoris yoyolicoris Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we need to do 2 type of tests. I will add another one that use self.assert_batch_consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can you clarify that in the above case, where you said "a_coeffs and b_coeffs should also be in batch as well", the number of filter bank happens to be same as the batch size, but that's not requirement?

So my understanding/expectation is that when input batch is the shape of [batch_size, sequence_length], a_coeffs and b_coeffs can take any shape of [filter_dim, number_of_filters], without being constrained on the input shape.

And if I understand correctly, here your test is testing that filter banks produces the same result regardless they are applied separately or together, in that correct?

Copy link
Contributor Author

@yoyolicoris yoyolicoris Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding/expectation is that when input batch is the shape of [batch_size, sequence_length], a_coeffs and b_coeffs can take any shape of [filter_dim, number_of_filters], without being constrained on the input shape.

In this case, when input is a 2D batch of signals, a_coeffs and b_coeffs should be in shape of [batch_size, filter_order + 1] or just [filter_order + 1]. The first one means that the number of filters is equal to batch_size, and each signal is applied with different filter; the second is just one filter apply on all signals.

The case that filter shape will not be constrainted, is when the shape of input is [..., 1, sequence_length]. Then a_coeffs and b_coeffs can be in any shape of 2D matrix [number_of_filters, filter_order + 1], the output shape will be [..., number_of_filters, sequence_length]. It means each signal is filtered by a shared set of filters.

And if I understand correctly, here your test is testing that filter banks produces the same result regardless they are applied separately or together, in that correct?

Yes, that's correct.

Copy link
Contributor Author

@yoyolicoris yoyolicoris Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I think the batch behavior we want to test is actually the coefficients, not the input. 😆
So we might need to change the test, with a_coeffs and b_coeffs as input batch, waveform as the parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding/expectation is that when input batch is the shape of [batch_size, sequence_length], a_coeffs and b_coeffs can take any shape of [filter_dim, number_of_filters], without being constrained on the input shape.

In this case, when input is a 2D batch of signals, a_coeffs and b_coeffs should be in shape of [batch_size, filter_order + 1] or just [filter_order + 1]. The first one means that the number of filters is equal to batch_size, and each signal is applied with different filter; the second is just one filter apply on all signals.

The case that filter shape will not be constrainted, is when the shape of input is [..., 1, sequence_length]. Then a_coeffs and b_coeffs can be in any shape of 2D matrix [number_of_filters, filter_order + 1], the output shape will be [..., number_of_filters, sequence_length]. It means each signal is filtered by a shared set of filters.

And if I understand correctly, here your test is testing that filter banks produces the same result regardless they are applied separately or together, in that correct?

Yes, that's correct.

@yoyololicon

Can you help me clarify with the understanding of the shape semantics here?

  • When the input batch is 2D, the first dimension is interpreted as channel, thus the number of filters has to match the number of channels.
  • When an input signal has multiple channels, then filters have to have multiple channels.
    Are these correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok

Can you help me clarify with the understanding of the shape semantics here?

  • When the input batch is 2D, the first dimension is interpreted as channel, thus the number of filters has to match the number of channels.
  • When an input signal has multiple channels, then filters have to have multiple channels.
    Are these correct?

If you want to apply multiple filters at once, these are correct; if there is only one filter, it will fall back to original behavior.
The shape semantics I proposed actually follows pytorch conventions except the last dimension, which is time or filter order.

@yoyolicoris
Copy link
Contributor Author

Benchmarks using the same script from #1441 (comment), running on the same cpu.
I'm currently having problems with building CUDA enabled environment, so if someone can help compiled the changes, run this script on GPU and report the result here I'll be appreciated.

Before

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    336.5  |   1089.4 
      [32, 1024]   |    709.1  |   2537.1 
      [32, 4096]   |   2447.2  |   9652.5 
      [64, 256]    |    484.0  |   1587.9 
      [64, 1024]   |   1214.6  |   4503.8 
      [64, 4096]   |   8046.5  |  22566.8 
      [128, 256]   |    715.6  |   2521.7 
      [128, 1024]  |   2306.6  |   9286.6 
      [128, 4096]  |  22508.9  |  52563.9 
2 threads: -------------------------------
      [32, 256]    |    292.3  |    909.0 
      [32, 1024]   |    581.4  |   1882.7 
      [32, 4096]   |   1661.6  |   6729.1 
      [64, 256]    |    409.6  |   1269.9 
      [64, 1024]   |    909.0  |   3018.8 
      [64, 4096]   |   8329.7  |  17457.2 
      [128, 256]   |    594.3  |   1880.2 
      [128, 1024]  |   1629.7  |   6745.9 
      [128, 4096]  |  22276.9  |  44076.9 
4 threads: -------------------------------
      [32, 256]    |    274.6  |    841.4 
      [32, 1024]   |    513.3  |   1584.7 
      [32, 4096]   |   1411.8  |   5429.1 
      [64, 256]    |    361.4  |   1062.1 
      [64, 1024]   |    786.3  |   2363.2 
      [64, 4096]   |   7515.3  |  16583.1 
      [128, 256]   |    514.5  |   1556.7 
      [128, 1024]  |   1378.2  |   5592.4 
      [128, 4096]  |  24044.3  |  42045.6 

Times are in microseconds (us).

After

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    356.1  |   1222.8 
      [32, 1024]   |    731.5  |   3058.8 
      [32, 4096]   |   2558.1  |  12071.8 
      [64, 256]    |    502.4  |   1869.7 
      [64, 1024]   |   1278.5  |   5729.2 
      [64, 4096]   |   9484.0  |  27662.5 
      [128, 256]   |    777.6  |   3126.6 
      [128, 1024]  |   2330.3  |  11658.8 
      [128, 4096]  |  22594.8  |  60734.4 
2 threads: -------------------------------
      [32, 256]    |    309.1  |    982.0 
      [32, 1024]   |    607.0  |   2186.2 
      [32, 4096]   |   2005.4  |   8002.2 
      [64, 256]    |    425.7  |   1414.8 
      [64, 1024]   |    934.9  |   3624.9 
      [64, 4096]   |   9420.4  |  20646.1 
      [128, 256]   |    622.6  |   2195.2 
      [128, 1024]  |   1697.0  |   7563.6 
      [128, 4096]  |  22331.7  |  48910.4 
4 threads: -------------------------------
      [32, 256]    |    294.0  |    910.8 
      [32, 1024]   |    530.9  |   1781.8 
      [32, 4096]   |   1391.8  |   5934.0 
      [64, 256]    |    378.9  |   1188.6 
      [64, 1024]   |    816.5  |   2699.3 
      [64, 4096]   |   9485.3  |  17686.3 
      [128, 256]   |    533.5  |   1707.9 
      [128, 1024]  |   1410.3  |   6197.0 
      [128, 4096]  |  22828.6  |  43533.4 

Times are in microseconds (us).

@yoyolicoris yoyolicoris requested a review from mthrok June 11, 2021 08:31
Comment on lines +83 to +89
@parameterized.expand([
((44100,), (2, 3), (2, 44100)),
((3, 44100), (1, 3), (3, 44100)),
((3, 44100), (3, 3), (3, 44100)),
((1, 2, 1, 44100), (3, 3), (1, 2, 3, 44100))
])
def test_lfilter_broadcast_shape(self, input_shape, coeff_shape, target_shape):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok
The part that tests the broadcasting behavior.

@mthrok
Copy link
Contributor

mthrok commented Jun 16, 2021

Hi @yoyololicon

Sorry for the delayed response. I was busy with release-related work. Now torchaudio 0.9 is out and I put your work around lfilter autograd support and performance improvement as one of the major update. Thanks again for all the contribution.

For this PR, I am finding that filter bank support is more complicated than I had imagined. This is natural and I see it analogous to how convolution and broadcasting there is complicated as well.

So can we split this PR into one for batch support and the other for filter bank support? I think we can tackle batch support first as the goal is simple, it is just about applying the same operation to a batch of sample without for-loop. Once it's done we can look at the filter bank support to discuss the broadcasting semantics.

@yoyolicoris
Copy link
Contributor Author

yoyolicoris commented Jun 17, 2021

@mthrok
It's nice to see that my contribution is included in the new release, thanks again for your kind help and support. 🎉

I think we can tackle batch support first as the goal is simple, it is just about applying the same operation to a batch of sample without for-loop.

I have a different understanding of the batch support feature.
Based on the example from #1561, the shape of waveform is (batch, time), and the shape of *_coeffs are (batch, n_order). What it was trying to do is filtering different signal with different coefficients in parallel, so I think it's not the same operation for each sample in batch.
Current version of lfilter has actually already support the type of feature you are talking about, where the shape of waveform is (..., time), and ... can be interpreted as batch dimensions.
I think the new feature we are going to support, is batch of lfilters in parallel.

I'm ok to split this PR into two, or even three where one for batch, one for filter bank, and the other for broadcasting semantics.
We can leave this PR as discussion place or the last PR of three.

I actually think filter bank support is easier than batch support so I suggest to open it first.

@mthrok
Copy link
Contributor

mthrok commented Jun 18, 2021

Current version of lfilter has actually already support the type of feature you are talking about, where the shape of waveform is (..., time), and ... can be interpreted as batch dimensions.
I think the new feature we are going to support, is batch of lfilters in parallel.

Oh sorry, yes that's what was in my mind. It is not about adding a batch support, but making the batch support efficient (presumably by moving the for loop into convolution ops)

I'm ok to split this PR into two, or even three where one for batch, one for filter bank, and the other for broadcasting semantics.
We can leave this PR as dicussion place or the last PR of three.

I actually think filter bank support is easier than batch support so I suggest to open it first.

Sure. That works. Please proceed with the approach most comfortable for you. I will try my best to understand the underlying logic.

@yoyolicoris
Copy link
Contributor Author

Since #1587 and #1638 has now merged, I will close this draft MR.

@yoyolicoris yoyolicoris deleted the feat/batch-lfilter branch August 11, 2021 02:07
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
* updated ddp_pipeline

* minor update

Co-authored-by: Brian Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple filters in lfilter Add batch dimension inside the computation of lfilter

3 participants