Skip to content

Conversation

@jcaw
Copy link
Contributor

@jcaw jcaw commented Dec 20, 2020

amplitude_to_DB currently clamps per-batch, but it should clamp per-item. I've modified it to clamp per-item (when a batch is provided) and I've modified the MFCC transform to take advantage of this new behavior. Tests for amplitude_to_DB are included but I've added no new tests for the modified MFCC transform.

This is just an initial draft. In #994 @vincentqb specified that it should always expect a tensor of shape (..., freq, time), so this implementation assumes the input is a spectrogram and determines whether it's a batch based on the number of dimensions (it assumes a batch when there are more than three dimensions, i.e. more than (channel, freq, time)). I'm not sure if this is the most sensible solution. It restricts the inputs to spectrograms and only allows batches with 4 (or more) dimensions. A batch with shape (item, freq, time) would be treated as a single item. This does seem contradictory if the specified input shape is (..., freq, time).

I also have a (rough) branch here which takes a batch flag to differentiate. Alternatively, batchwise conversions could be pulled into a separate method. (Perhaps amplitude_to_DB_iid?)

Closes #994

jcaw added 2 commits December 20, 2020 19:15
When passed a batch, `amplitude_to_DB` was clamping based on the entire
batch's maximum value. This was wrong. Apply the clamp based on each
item's maximum when a batch is detected.

This change requires `amplitude_to_DB` to be restricted to spectrogram
inputs only, since items need to have a predictable number of
dimensions (in this case, 3) to automatically detect batches.

Additional tests are also added to check both batched and unbatched
inputs, and ensure items are being clamped correctly.
The `MFCC` transform doesn't need to pack batches, since the mel
spectrogram conversion operates fine on batched input (it packs batches
itself). The fixed form of `amplitude_to_DB` also now requires an
unpacked batch to clamp correctly.

Since it's unnecessary, remove packing from the MFFC transform
completely.
@jcaw
Copy link
Contributor Author

jcaw commented Dec 20, 2020

(The tests in here are a little rough in structure since the infrastructure is due to be replaced)

This should allow `amplitude_to_DB` to compile to torchscript.
@jcaw
Copy link
Contributor Author

jcaw commented Dec 21, 2020

The torchscript compilation tripped me up but I'm having trouble getting something that's compatible. This line doesn't want to compile - it's giving me RuntimeError: Cannot emit expr for: (dots). The Ellipsis constant also fails.

Is there a way to either add multiple singleton dimensions to the right like this, or to somehow broadcast from the opposite direction to normal (aligning to the left, instead of the right)?

I've got it working locally doing this, but I don't like it. Doesn't seem optimal:

        if x_db.dim() > 3:
            flat_shape = x_db.size(0), -1
            db_floors = x_db.reshape(flat_shape).amax(dim=1) - top_db
            x_db = torch.max(x_db.view(flat_shape), db_floors.unsqueeze(1)).view(x_db.shape)

(Is there also a document I can reference for full contribution guidelines, so I can run the CI locally?)

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the shape: in practice, saying (..., freq, time) means we expect (freq, time), (channel, freq, time), (batch, freq, time), (batch, channel, freq, time), etc. The ambiguity is in the 3 dimensions as you pointed out: (batch, freq, time) vs (channel, freq, time).

My suggestion is to support the original behavior for (freq, time), (channel, freq, time) and add documentation. We then extend (batch, channel, freq, time) or (..., channel, freq, time) to do clamping "per (channel, freq, time). Therefore, we do not support directly the case (batch, freq, time), though someone (outside the function call) could easily unsqueeze to add a 1 channel.

The torchscript compilation tripped me up but I'm having trouble getting something that's compatible. This line doesn't want to compile - it's giving me RuntimeError: Cannot emit expr for: (dots). The Ellipsis constant also fails.

Is there a way to either add multiple singleton dimensions to the right like this, or to somehow broadcast from the opposite direction to normal (aligning to the left, instead of the right)?

Do you mean to add a dimension like (batch, 1, freq, time) with unsequeeze?

(Is there also a document I can reference for full contribution guidelines, so I can run the CI locally?)

The tests detailed here can be run using pytest. Is that what you meant?

@jcaw
Copy link
Contributor Author

jcaw commented Jan 6, 2021

Awesome, thanks. I'll implement these suggestions now.

Do you mean to add a dimension like (batch, 1, freq, time) with unsequeeze?

More like a version of unsqueeze that can add an arbitrary number of dimensions (rather than just 1), but with the rewrite I think this is unnecessary. Although, I'm still curious how to do this in a way that will compile to torchscript. I've run into the problem before.

The tests detailed here can be run using pytest. Is that what you meant?

I was isolating the amplitude_to_DB tests so it wasn't checking torchscript compilation. I assumed that was tested elsewhere, but including the torchscript consistency tests does the trick.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor change to doc, thanks for working on this!

@jcaw
Copy link
Contributor Author

jcaw commented Jan 7, 2021

No worries!

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the functional module looks good, but tests need more context for the maintainability.

@jcaw jcaw requested a review from mthrok January 20, 2021 19:10
Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jcaw

Thanks for working on this.
The tests in batch_consistency looks good.
I had a couple of question regarding the tests in Testamplitude_to_DB.
Specifically #Predictability part, I am having difficulty understanding it.

If you do not have time to address the comments, let me know.
I do not want to drag you around too much, so I will move on and address them later.


self.assertEqual(x2, spec)

def test_amplitude_to_DB_batch(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nit-picky but since all the tests in the Testamplitude_to_DB class is about amplitude_to_DB, having each test method name describe "what aspect of the function of the interest is tested" gives better maintenance experience. (Imagine that these code will be most likely maintained by someone without any context, in fact I am regular software engineer and not expert in audio domain, so soon, if I have to come back to this code, it will take a while to figure it out what it is).

Here is my suggestion

  • merge test_amplitude_to_DB_batch, test_amplitude_to_DB_3dims and test_amplitude_to_DB_2dims, parameterize the shape, then give a good name for the method (also _ensure_reversible because there is no need to extract it).

Something like

@parameterized.expand([
    ([2, 2, 100, 100]),
    ([2, 100, 100]),
    ([100, 100]),
])
def test_reversible(self, shape):
    """Round trip between amplitude and db should return the original for various shape
    This implicitly also tests `DB_to_amplitude`.
    """
    torch.manual_seed(0)
    spec = torch.rand(*shape) * 200
    ...

Copy link
Contributor Author

@jcaw jcaw Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's nitpicky - you're right. I think I was sticking with the original naming convention for the Testamplitude_to_DB class, but the other test classes use the naming scheme you described anyway, which is more sensible.

spec = torch.rand([1, 2, 100, 100]) * 200
# Predictability
spec[0, 0, 1] = 0
spec[0, 0, 0] = 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? What is the need to do this? If the particular value should be used so that the tested function's specific behavior occurs, what about using a non-random tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decibel cutoff is derived from the smallest value in the spectrogram, so in order to hard-code the cutoff, the smallest value needs to be predictable. There's also a (tiny) chance that no values large enough to be clamped are generated, so I manually set the max.

I default to using a random tensor when possible because it adds entropy to the inputs. It's just a rule of thumb, I can certainly change it.

I didn't really like this structure either, it is confusing. I've rewritten it to scale each spectrogram separately to strictly match the range (0, 200), which seems more sensible. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the comments you added nicely explain the intention well. I think this is good.

@mthrok mthrok added this to the v0.8 milestone Jan 25, 2021
jcaw added 4 commits February 1, 2021 16:06
Simpler than maintaining separate tests for each
Also change the way the spectrograms are generated to work with all the
given shapes, and apply the correct range to all spectrograms.
@jcaw jcaw requested a review from mthrok February 1, 2021 16:27
Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jcaw

Thanks for working on this. The change looks good and the tests are now very comprehensive.
There is a conflict with latest master, so please resolve it (or let me know if you are busy, then I will take over).

@jcaw
Copy link
Contributor Author

jcaw commented Feb 1, 2021

Would you like me to merge or rebase?

@mthrok
Copy link
Contributor

mthrok commented Feb 1, 2021

Would you like me to merge or rebase?

(I assume you mean merge commit as I do not think you can press the merge button of the PR.) Either works fine so long as the conflict is resolved. (so that I can click the merge button)
But since you have a bunch of commits on your branch, if you are going to rebase, you will need to squash the commits otherwise resolving the conflict will be difficult.

@vincentqb
Copy link
Contributor

The tests failing are not related to this pull request. I'll go ahead and merge the pull request. Thank you for the work @jcaw!

@vincentqb vincentqb merged commit 4e99c12 into pytorch:master Feb 4, 2021
@jcaw
Copy link
Contributor Author

jcaw commented Feb 4, 2021

No worries! Happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

transforms.AmplitudeToDB does not handle cut-off correctly for multi-channel or batched data

4 participants