-
Notifications
You must be signed in to change notification settings - Fork 739
Fix amplitude_to_DB clamping behaviour on batches
#1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When passed a batch, `amplitude_to_DB` was clamping based on the entire batch's maximum value. This was wrong. Apply the clamp based on each item's maximum when a batch is detected. This change requires `amplitude_to_DB` to be restricted to spectrogram inputs only, since items need to have a predictable number of dimensions (in this case, 3) to automatically detect batches. Additional tests are also added to check both batched and unbatched inputs, and ensure items are being clamped correctly.
The `MFCC` transform doesn't need to pack batches, since the mel spectrogram conversion operates fine on batched input (it packs batches itself). The fixed form of `amplitude_to_DB` also now requires an unpacked batch to clamp correctly. Since it's unnecessary, remove packing from the MFFC transform completely.
|
(The tests in here are a little rough in structure since the infrastructure is due to be replaced) |
This should allow `amplitude_to_DB` to compile to torchscript.
|
The torchscript compilation tripped me up but I'm having trouble getting something that's compatible. This line doesn't want to compile - it's giving me Is there a way to either add multiple singleton dimensions to the right like this, or to somehow broadcast from the opposite direction to normal (aligning to the left, instead of the right)? I've got it working locally doing this, but I don't like it. Doesn't seem optimal: if x_db.dim() > 3:
flat_shape = x_db.size(0), -1
db_floors = x_db.reshape(flat_shape).amax(dim=1) - top_db
x_db = torch.max(x_db.view(flat_shape), db_floors.unsqueeze(1)).view(x_db.shape)(Is there also a document I can reference for full contribution guidelines, so I can run the CI locally?) |
vincentqb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the shape: in practice, saying (..., freq, time) means we expect (freq, time), (channel, freq, time), (batch, freq, time), (batch, channel, freq, time), etc. The ambiguity is in the 3 dimensions as you pointed out: (batch, freq, time) vs (channel, freq, time).
My suggestion is to support the original behavior for (freq, time), (channel, freq, time) and add documentation. We then extend (batch, channel, freq, time) or (..., channel, freq, time) to do clamping "per (channel, freq, time). Therefore, we do not support directly the case (batch, freq, time), though someone (outside the function call) could easily unsqueeze to add a 1 channel.
The torchscript compilation tripped me up but I'm having trouble getting something that's compatible. This line doesn't want to compile - it's giving me
RuntimeError: Cannot emit expr for: (dots). TheEllipsisconstant also fails.Is there a way to either add multiple singleton dimensions to the right like this, or to somehow broadcast from the opposite direction to normal (aligning to the left, instead of the right)?
Do you mean to add a dimension like (batch, 1, freq, time) with unsequeeze?
(Is there also a document I can reference for full contribution guidelines, so I can run the CI locally?)
The tests detailed here can be run using pytest. Is that what you meant?
|
Awesome, thanks. I'll implement these suggestions now.
More like a version of
I was isolating the |
Test the batch & channel dimensions separately
It's not necessary, just generate random tensors.
Ensure they're independent.
One for each target shape. (The subtests now pass again)
Don't rely on torch.rand to produce good values.
vincentqb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor change to doc, thanks for working on this!
|
No worries! |
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the functional module looks good, but tests need more context for the maintainability.
`AmplitudeToDB` expects items in batches to have 3 dims, including channels. Use 3 dims to test batch consistency.
Also rename them since they're not enclosed in a specific `amplitude_to_DB` test class.
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jcaw
Thanks for working on this.
The tests in batch_consistency looks good.
I had a couple of question regarding the tests in Testamplitude_to_DB.
Specifically #Predictability part, I am having difficulty understanding it.
If you do not have time to address the comments, let me know.
I do not want to drag you around too much, so I will move on and address them later.
|
|
||
| self.assertEqual(x2, spec) | ||
|
|
||
| def test_amplitude_to_DB_batch(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nit-picky but since all the tests in the Testamplitude_to_DB class is about amplitude_to_DB, having each test method name describe "what aspect of the function of the interest is tested" gives better maintenance experience. (Imagine that these code will be most likely maintained by someone without any context, in fact I am regular software engineer and not expert in audio domain, so soon, if I have to come back to this code, it will take a while to figure it out what it is).
Here is my suggestion
- merge
test_amplitude_to_DB_batch,test_amplitude_to_DB_3dimsandtest_amplitude_to_DB_2dims, parameterize the shape, then give a good name for the method (also_ensure_reversiblebecause there is no need to extract it).
Something like
@parameterized.expand([
([2, 2, 100, 100]),
([2, 100, 100]),
([100, 100]),
])
def test_reversible(self, shape):
"""Round trip between amplitude and db should return the original for various shape
This implicitly also tests `DB_to_amplitude`.
"""
torch.manual_seed(0)
spec = torch.rand(*shape) * 200
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's nitpicky - you're right. I think I was sticking with the original naming convention for the Testamplitude_to_DB class, but the other test classes use the naming scheme you described anyway, which is more sensible.
| spec = torch.rand([1, 2, 100, 100]) * 200 | ||
| # Predictability | ||
| spec[0, 0, 1] = 0 | ||
| spec[0, 0, 0] = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? What is the need to do this? If the particular value should be used so that the tested function's specific behavior occurs, what about using a non-random tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decibel cutoff is derived from the smallest value in the spectrogram, so in order to hard-code the cutoff, the smallest value needs to be predictable. There's also a (tiny) chance that no values large enough to be clamped are generated, so I manually set the max.
I default to using a random tensor when possible because it adds entropy to the inputs. It's just a rule of thumb, I can certainly change it.
I didn't really like this structure either, it is confusing. I've rewritten it to scale each spectrogram separately to strictly match the range (0, 200), which seems more sensible. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the comments you added nicely explain the intention well. I think this is good.
Simpler than maintaining separate tests for each
Also change the way the spectrograms are generated to work with all the given shapes, and apply the correct range to all spectrograms.
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jcaw
Thanks for working on this. The change looks good and the tests are now very comprehensive.
There is a conflict with latest master, so please resolve it (or let me know if you are busy, then I will take over).
|
Would you like me to merge or rebase? |
(I assume you mean |
…ude_to_db_batch_fix
|
The tests failing are not related to this pull request. I'll go ahead and merge the pull request. Thank you for the work @jcaw! |
|
No worries! Happy to help. |
amplitude_to_DBcurrently clamps per-batch, but it should clamp per-item. I've modified it to clamp per-item (when a batch is provided) and I've modified the MFCC transform to take advantage of this new behavior. Tests foramplitude_to_DBare included but I've added no new tests for the modified MFCC transform.This is just an initial draft. In #994 @vincentqb specified that it should always expect a tensor of shape
(..., freq, time), so this implementation assumes the input is a spectrogram and determines whether it's a batch based on the number of dimensions (it assumes a batch when there are more than three dimensions, i.e. more than(channel, freq, time)). I'm not sure if this is the most sensible solution. It restricts the inputs to spectrograms and only allows batches with 4 (or more) dimensions. A batch with shape(item, freq, time)would be treated as a single item. This does seem contradictory if the specified input shape is(..., freq, time).I also have a (rough) branch here which takes a
batchflag to differentiate. Alternatively, batchwise conversions could be pulled into a separate method. (Perhapsamplitude_to_DB_iid?)Closes #994