-
Notifications
You must be signed in to change notification settings - Fork 739
Add pretrained weights for wavernn #1612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
25cf920 to
91ca4bd
Compare
|
Hi! I'm a bit worried that we're moving forward without explicit consent from Linda Johnson. Before her voice becomes easily accessible via I'm particularly worried because there are a lot of issues to consider:
Out of respect for a fellow person, I think we should double-check with Linda Johnson before this PR is approved. Thanks for your consideration! (I understand that this dataset has already gotten really popular. Even so, I think we should take a step in the right direction and ask for permission before going ahead with this push to the official |
7046810 to
7d81105
Compare
|
Hi @PetrochukM Thanks for bringing up the issue again. @PetrochukM I am curious to learn your opinion on publishing the pre-trained model for vocoder. |
|
@dongreenberg -- can you comment here? following internal: october 12 |
|
nit: flake8 :) |
torchaudio/models/wavernn.py
Outdated
| return x.unsqueeze(1) | ||
|
|
||
|
|
||
| def wavernn(pretrained: bool = True, progress: bool = True, **kwargs: Any) -> WaveRNN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's stay closer to the convention set in here: we have a helper function _wavernn that passes the kwargs to WaveRNN, and a particular one called wavernn_10k_epochs_8bits_ljspeech
torchaudio/models/wavernn.py
Outdated
| model_urls = { | ||
| 'wavernn': 'https://download.pytorch.org/models/audio/wavernn_10k_epochs_8bits_ljspeech.pth', | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case this line is too long:
model_urls = {
'wavernn_10k_epochs_8bits_ljspeech': (
'https://download.pytorch.org/models/audio/'
'wavernn_10k_epochs_8bits_ljspeech.pth'
),
}
I think it's OKAY as long as the voice actor(s) have given their written and explicit permission (knowing all the consequences of doing so) to publish their voice. I think it'd be really cool if |
99f9d75 to
0fabf59
Compare
torchaudio/models/wavernn.py
Outdated
| 'n_hidden': 128, | ||
| 'n_output': 128 | ||
| } | ||
| configs.update(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to follow the convention in here, we should have kwargs.update(configs) or otherwise just update the dictionary directly.
70243ce to
330e329
Compare
|
@PetrochukM -- thanks again for raising those concerns :) The previous discussion is in #776 and the author of the dataset @keithito commented here that he has personally corresponded with Linda, and confirmed that she has been very supportive of having her recordings used as the basis of a public domain speech dataset. based on this, we will go ahead and publish the pre-trained weights. however, anyone using such pre-trained models should consult their own lawyers ahead of time, in a similar fashion to the notice given here. please do let us know if you have any other concerns. |
|
@vincentqb Thanks for addressing my concerns! The Linda Johnson dataset is now 4 years old (before Tacotron-2 was even published), so I'm worried that her comments were made a long time ago. I'm worried that this dataset has been used much more widely than originally intended. Would it be okay if got some more clarification on the correspondence between Linda and Keith? |
@vincentqb Both the comment of @keithito and the internal document you are pointing are about the copy right of the data set (and the derived copy right of a model trained with the dataset). To me it looks different from the points and concerns @PetrochukM is bringing up. I do not think we should make a rushed decision to make it available as this seems very sensitive matter. |
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good.
b6eca86 to
cef4efd
Compare
cef4efd to
8f0466d
Compare
vincentqb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but @mthrok do you have any other feedback?
i'll also let @mthrok and @dongreenberg follow-up on comment.
|
Closing the loop on this. We had an internal review and had our legal team analyze the license to see whether this is in the scope of the license, which they deem it to be. |
4b79bda to
7c42129
Compare
7d09505 to
cf135f4
Compare
torchaudio/models/wavernn.py
Outdated
| The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py. | ||
| """ | ||
| if checkpoint_name not in _MODEL_CONFIG_AND_URLS: | ||
| raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When validating a value and there is a (small number of) finite set of valid values, listing them out is more user-friendly.
Imagine that I tried to pass wavernn_10k_epochs_8bits_ljspeech but misspelled wavernn_10k_epochs_8bits_ljspeeck. If the error message only tells me it's invalid, then I have to search for the documentation to see what is correct. If the error message also tells what are the valid choices, then I can copy-paste the valid ones from the error message and retry at instant.
not supported is correct but sounds like it's planned to be supported, and I think unexpected is more regularly used.
str.format method is fine, but typically, f-string is more readable and the code becomes shorter.
| raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name)) | |
| raise ValueError( | |
| f"Unexpected checkpoint_name: '{checkpoint_name}'. " | |
| f"Valid choices are; {list(_MODEL_CONFIG_AND_URLS.keys())}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. These are definitely better designs.
I've fixed them here.
torchaudio/models/wavernn.py
Outdated
| Args: | ||
| checkpoint_name (str): The name of the checkpoint to load. Available checkpoints: | ||
| - wavernn_10k_epochs_8bits_ljspeech: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - wavernn_10k_epochs_8bits_ljspeech: | |
| - ``"wavernn_10k_epochs_8bits_ljspeech"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for point it out. I've fixed them here.
cf135f4 to
e53390b
Compare
|
I wanted to check in. Are y'all going to publish Linda Johnson's voice for the public to use without asking her for explicit and informed permission? |
torchaudio/models/wavernn.py
Outdated
| - ``"wavernn_10k_epochs_8bits_ljspeech"``: | ||
| WaveRNN model trained with 10k epochs and 8 bits depth waveform on the LJSpeech dataset. | ||
| The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe adding hyperlink. https://github.com/pytorch/audio/tree/master/examples/pipeline_wavernn
Most definitely not. |
Closes #776.
Offers pertained weight for WaveRNN with 8 bits waveform mode and trained on LJSpeech.
Following the convention here.