Skip to content

Conversation

@jimchen90
Copy link
Contributor

@jimchen90 jimchen90 commented Jun 17, 2020

This Upsampling block is part of WaveRNN model. Now the test is to validate the output dimensions of this block. Other tests will be added after other blocks are combined.
Related to #446

Stack:

Add MelResNet Block #705 #751
Add Upsampling Block #724
Add WaveRNN Model #735
Add example pipeline with WaveRNN #749

@jimchen90 jimchen90 requested a review from vincentqb June 17, 2020 13:46
@jimchen90 jimchen90 marked this pull request as draft June 17, 2020 13:49
@codecov
Copy link

codecov bot commented Jun 17, 2020

Codecov Report

Merging #724 into master will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #724      +/-   ##
==========================================
+ Coverage   89.21%   89.35%   +0.13%     
==========================================
  Files          32       32              
  Lines        2513     2546      +33     
==========================================
+ Hits         2242     2275      +33     
  Misses        271      271              
Impacted Files Coverage Δ
torchaudio/models/_wavernn.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 878d3da...44bff04. Read the comment docs.

x: the input sequence to the _Stretch2d layer (required).
Shape:
- x: :math:`(N, C, S, T)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove period at end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Shape:
- x: :math:`(N, C, S, T)`.
- output: :math:`(N, C, S * y_scale, T * x_scale)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove period at end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

T is the length of input sequence.
"""

n, c, s, t = x.size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expecting a four tuple is a little rigid, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. some functions support ... to mean an arbitrary number of dimensions, see functionals in torchaudio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it.

x: the input sequence to the _UpsampleNetwork layer (required).
Shape:
- x: :math:`(N, S, T)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See notation in readme for output of spectrogram

Spectrogram: (channel, time) -> (channel, freq, time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names have been updated.

Shape:
- x: :math:`(N, S, T)`.
- output: :math:`(N, (T - 2 * pad) * Total_Scale, S)`, `(N, (T - 2 * pad) * total_scale, P)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lower/upper case in total_scale and Total_Scale. I like the first better :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, total_scale looks better. Fixed.

- x: :math:`(N, S, T)`.
- output: :math:`(N, (T - 2 * pad) * Total_Scale, S)`, `(N, (T - 2 * pad) * total_scale, P)`.
where N is the batch size, S is the number of input sequence, T is the length of input sequence.
P is the number of output sequence. Total_Scale is the product of all elements in upsample_scales.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: P = output_dims ? just do that, or specify that P = output_dims.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name has been updated. I use n_output here to match other places. No single letter is used in docstring now.

resnet_output = self.resnet_stretch(resnet_output)
resnet_output = resnet_output.squeeze(1)

upsampling_output = self.upsample_layers(x.unsqueeze(1))
Copy link
Contributor

@vincentqb vincentqb Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a line x = x.unsqueeze(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has been added.

@jimchen90 jimchen90 force-pushed the upsampling branch 2 times, most recently from f4b4c76 to c98e289 Compare June 18, 2020 19:36
@vincentqb
Copy link
Contributor

vincentqb commented Jun 25, 2020

Is there a doc available? Can you attach the link?

EDIT: internal


total_scale = 1
for upsample_scale in upsample_scales:
total_scale *= upsample_scale
Copy link
Contributor

@vincentqb vincentqb Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assert or error message checking that total_scale == hop_length, and document this requirement (e.g. "product of upsample_scale must equal hop_length") in docstring.

Copy link
Contributor Author

@jimchen90 jimchen90 Jun 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because hop_length is a variable only in WaveRNN. I added an error message (line) and document this requirement (line) of WaveRNN #735 .

@jimchen90 jimchen90 mentioned this pull request Jun 25, 2020
@jimchen90 jimchen90 mentioned this pull request Jun 26, 2020
@jimchen90 jimchen90 marked this pull request as ready for review June 29, 2020 15:42
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants