-
Notifications
You must be signed in to change notification settings - Fork 739
UpsampleNetwork #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UpsampleNetwork #724
Conversation
Codecov Report
@@ Coverage Diff @@
## master #724 +/- ##
==========================================
+ Coverage 89.21% 89.35% +0.13%
==========================================
Files 32 32
Lines 2513 2546 +33
==========================================
+ Hits 2242 2275 +33
Misses 271 271
Continue to review full report at Codecov.
|
torchaudio/models/_wavernn.py
Outdated
| x: the input sequence to the _Stretch2d layer (required). | ||
| Shape: | ||
| - x: :math:`(N, C, S, T)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove period at end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
torchaudio/models/_wavernn.py
Outdated
| Shape: | ||
| - x: :math:`(N, C, S, T)`. | ||
| - output: :math:`(N, C, S * y_scale, T * x_scale)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove period at end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
torchaudio/models/_wavernn.py
Outdated
| T is the length of input sequence. | ||
| """ | ||
|
|
||
| n, c, s, t = x.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expecting a four tuple is a little rigid, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. some functions support ... to mean an arbitrary number of dimensions, see functionals in torchaudio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it.
torchaudio/models/_wavernn.py
Outdated
| x: the input sequence to the _UpsampleNetwork layer (required). | ||
| Shape: | ||
| - x: :math:`(N, S, T)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See notation in readme for output of spectrogram
Spectrogram: (channel, time) -> (channel, freq, time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable names have been updated.
torchaudio/models/_wavernn.py
Outdated
| Shape: | ||
| - x: :math:`(N, S, T)`. | ||
| - output: :math:`(N, (T - 2 * pad) * Total_Scale, S)`, `(N, (T - 2 * pad) * total_scale, P)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lower/upper case in total_scale and Total_Scale. I like the first better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, total_scale looks better. Fixed.
torchaudio/models/_wavernn.py
Outdated
| - x: :math:`(N, S, T)`. | ||
| - output: :math:`(N, (T - 2 * pad) * Total_Scale, S)`, `(N, (T - 2 * pad) * total_scale, P)`. | ||
| where N is the batch size, S is the number of input sequence, T is the length of input sequence. | ||
| P is the number of output sequence. Total_Scale is the product of all elements in upsample_scales. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: P = output_dims ? just do that, or specify that P = output_dims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name has been updated. I use n_output here to match other places. No single letter is used in docstring now.
torchaudio/models/_wavernn.py
Outdated
| resnet_output = self.resnet_stretch(resnet_output) | ||
| resnet_output = resnet_output.squeeze(1) | ||
|
|
||
| upsampling_output = self.upsample_layers(x.unsqueeze(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a line x = x.unsqueeze(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line has been added.
f4b4c76 to
c98e289
Compare
|
Is there a doc available? Can you attach the link? EDIT: internal |
|
|
||
| total_scale = 1 | ||
| for upsample_scale in upsample_scales: | ||
| total_scale *= upsample_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an assert or error message checking that total_scale == hop_length, and document this requirement (e.g. "product of upsample_scale must equal hop_length") in docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vincentqb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
This Upsampling block is part of WaveRNN model. Now the test is to validate the output dimensions of this block. Other tests will be added after other blocks are combined.
Related to #446
Stack:
Add MelResNet Block #705 #751Add Upsampling Block #724Add WaveRNN Model #735Add example pipeline with WaveRNN #749