Add pretrained weights for wavernn #1612

yangarbiter · 2021-06-25T21:02:51Z

Closes #776.

Offers pertained weight for WaveRNN with 8 bits waveform mode and trained on LJSpeech.

Following the convention here.

torchaudio/models/wavernn.py

PetrochukM · 2021-06-26T03:43:08Z

Hi!

I'm a bit worried that we're moving forward without explicit consent from Linda Johnson. Before her voice becomes easily accessible via torchaudio, it might be worthwhile to get a written confirmation from her.

I'm particularly worried because there are a lot of issues to consider:

What if Linda Johnson goes to trial? Could this model be used to falsify evidence?
What happens if Linda Johnson dies? Will her Family be traumatized by hearing her voice used by the public?
Could someone impersonate Linda Johnson using her voice? Will that get her in trouble?
This voice could be used to defraud the elderly via scam calls. Is Linda Johnson okay with that?
Linda Johnson's voice will be easily accessible via PyTorch's official documentation. Has Linda Johnson considered if she wants that kind of exposure?

Out of respect for a fellow person, I think we should double-check with Linda Johnson before this PR is approved.

Thanks for your consideration!

(I understand that this dataset has already gotten really popular. Even so, I think we should take a step in the right direction and ask for permission before going ahead with this push to the official torchaudio repo.)

mthrok · 2021-07-02T15:15:06Z

Hi @PetrochukM

Thanks for bringing up the issue again.
@vincentqb Can you address these concerns?

@PetrochukM I am curious to learn your opinion on publishing the pre-trained model for vocoder.
Would you think that publishing the model is risky and would not do if you were a maintainer of the repo?

vincentqb · 2021-07-06T19:35:40Z

@dongreenberg -- can you comment here? following internal: october 12

vincentqb · 2021-07-06T19:37:49Z

nit: flake8 :)

vincentqb · 2021-07-06T19:49:03Z

torchaudio/models/wavernn.py

        return x.unsqueeze(1)
+
+
+def wavernn(pretrained: bool = True, progress: bool = True, **kwargs: Any) -> WaveRNN:


let's stay closer to the convention set in here: we have a helper function _wavernn that passes the kwargs to WaveRNN, and a particular one called wavernn_10k_epochs_8bits_ljspeech

vincentqb · 2021-07-06T19:51:23Z

torchaudio/models/wavernn.py

+model_urls = {
+    'wavernn': 'https://download.pytorch.org/models/audio/wavernn_10k_epochs_8bits_ljspeech.pth',
+}


in case this line is too long:

model_urls = { 'wavernn_10k_epochs_8bits_ljspeech': ( 'https://download.pytorch.org/models/audio/' 'wavernn_10k_epochs_8bits_ljspeech.pth' ), }

PetrochukM · 2021-07-06T20:30:46Z

@PetrochukM I am curious to learn your opinion on publishing the pre-trained model for vocoder.
Would you think that publishing the model is risky and would not do if you were a maintainer of the repo?

I think it's OKAY as long as the voice actor(s) have given their written and explicit permission (knowing all the consequences of doing so) to publish their voice.

I think it'd be really cool if torchaudio standardized asking voice actor(s) for consent and torchaudio included a short consent document along with the published model.

vincentqb · 2021-07-07T19:18:01Z

torchaudio/models/wavernn.py

+        'n_hidden': 128,
+        'n_output': 128
+    }
+    configs.update(kwargs)


to follow the convention in here, we should have kwargs.update(configs) or otherwise just update the dictionary directly.

vincentqb · 2021-07-09T19:55:12Z

@PetrochukM -- thanks again for raising those concerns :) The previous discussion is in #776 and the author of the dataset @keithito commented here that he has personally corresponded with Linda, and confirmed that she has been very supportive of having her recordings used as the basis of a public domain speech dataset. based on this, we will go ahead and publish the pre-trained weights. however, anyone using such pre-trained models should consult their own lawyers ahead of time, in a similar fashion to the notice given here. please do let us know if you have any other concerns.

cc @dongreenberg

PetrochukM · 2021-07-09T20:45:43Z

@vincentqb Thanks for addressing my concerns!

The Linda Johnson dataset is now 4 years old (before Tacotron-2 was even published), so I'm worried that her comments were made a long time ago. I'm worried that this dataset has been used much more widely than originally intended.

Would it be okay if got some more clarification on the correspondence between Linda and Keith?

mthrok · 2021-07-09T20:55:36Z

@vincentqb Thanks for addressing my concerns!

The Linda Johnson dataset is now 4 years old (before Tacotron-2 was even published), so I'm worried that her comments were made a long time ago. I'm worried that this dataset has been used much more widely than originally intended.

Would it be okay if got some more clarification on the correspondence between Linda and Keith?

@vincentqb Both the comment of @keithito and the internal document you are pointing are about the copy right of the data set (and the derived copy right of a model trained with the dataset). To me it looks different from the points and concerns @PetrochukM is bringing up. I do not think we should make a rushed decision to make it available as this seems very sensitive matter.

mthrok

Overall, it looks good.

torchaudio/models/_utils.py

torchaudio/models/wavernn.py

vincentqb

LGTM, but @mthrok do you have any other feedback?

i'll also let @mthrok and @dongreenberg follow-up on comment.

dongreenberg · 2021-07-16T14:10:04Z

Closing the loop on this. We had an internal review and had our legal team analyze the license to see whether this is in the scope of the license, which they deem it to be.

torchaudio/models/wavernn.py

mthrok · 2021-07-19T15:03:28Z

torchaudio/models/wavernn.py

+                The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py.
+    """
+    if checkpoint_name not in _MODEL_CONFIG_AND_URLS:
+        raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name))


When validating a value and there is a (small number of) finite set of valid values, listing them out is more user-friendly.

Imagine that I tried to pass wavernn_10k_epochs_8bits_ljspeech but misspelled wavernn_10k_epochs_8bits_ljspeeck. If the error message only tells me it's invalid, then I have to search for the documentation to see what is correct. If the error message also tells what are the valid choices, then I can copy-paste the valid ones from the error message and retry at instant.

not supported is correct but sounds like it's planned to be supported, and I think unexpected is more regularly used.

str.format method is fine, but typically, f-string is more readable and the code becomes shorter.

Suggested change

raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name))

raise ValueError(

f"Unexpected checkpoint_name: '{checkpoint_name}'. "

f"Valid choices are; {list(_MODEL_CONFIG_AND_URLS.keys())}")

Thanks for pointing it out. These are definitely better designs.
I've fixed them here.

mthrok · 2021-07-19T15:08:42Z

torchaudio/models/wavernn.py

+    Args:
+        checkpoint_name (str): The name of the checkpoint to load. Available checkpoints:
+
+            - wavernn_10k_epochs_8bits_ljspeech:


Suggested change

- wavernn_10k_epochs_8bits_ljspeech:

- ``"wavernn_10k_epochs_8bits_ljspeech"``

Thanks for point it out. I've fixed them here.

PetrochukM · 2021-07-19T20:38:22Z

I wanted to check in. Are y'all going to publish Linda Johnson's voice for the public to use without asking her for explicit and informed permission?

mthrok · 2021-07-20T20:08:07Z

torchaudio/models/wavernn.py

+            - ``"wavernn_10k_epochs_8bits_ljspeech"``:
+
+                WaveRNN model trained with 10k epochs and 8 bits depth waveform on the LJSpeech dataset.
+                The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py.


nit: Maybe adding hyperlink. https://github.com/pytorch/audio/tree/master/examples/pipeline_wavernn

dongreenberg · 2021-07-20T20:24:38Z

I wanted to check in. Are y'all going to publish Linda Johnson's voice for the public to use without asking her for explicit and informed permission?

Most definitely not.

yangarbiter requested a review from vincentqb June 25, 2021 21:02

facebook-github-bot added the CLA Signed label Jun 25, 2021

yangarbiter marked this pull request as draft June 25, 2021 21:03

yangarbiter mentioned this pull request Jun 25, 2021

Offer pretrained wavernn model #776

Closed

vincentqb reviewed Jun 25, 2021

View reviewed changes

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

vincentqb reviewed Jun 25, 2021

View reviewed changes

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 25cf920 to 91ca4bd Compare June 25, 2021 21:15

yangarbiter force-pushed the pretrained_wavernn branch 7 times, most recently from 7046810 to 7d81105 Compare July 1, 2021 21:58

vincentqb reviewed Jul 6, 2021

View reviewed changes

yangarbiter force-pushed the pretrained_wavernn branch 7 times, most recently from 99f9d75 to 0fabf59 Compare July 7, 2021 17:35

vincentqb reviewed Jul 7, 2021

View reviewed changes

Add pretrained wavernn

5e067d8

yangarbiter force-pushed the pretrained_wavernn branch from 70243ce to 330e329 Compare July 8, 2021 18:41

yangarbiter requested a review from carolineechen July 8, 2021 19:15

mthrok reviewed Jul 12, 2021

View reviewed changes

torchaudio/models/_utils.py Outdated Show resolved Hide resolved

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

yangarbiter force-pushed the pretrained_wavernn branch from b6eca86 to cef4efd Compare July 12, 2021 19:31

Fix a few coding style

8f0466d

yangarbiter force-pushed the pretrained_wavernn branch from cef4efd to 8f0466d Compare July 12, 2021 19:32

vincentqb approved these changes Jul 14, 2021

View reviewed changes

yangarbiter marked this pull request as ready for review July 15, 2021 16:57

mthrok reviewed Jul 16, 2021

View reviewed changes

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

mthrok reviewed Jul 16, 2021

View reviewed changes

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 4b79bda to 7c42129 Compare July 16, 2021 22:49

mthrok reviewed Jul 17, 2021

View reviewed changes

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved

yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 7d09505 to cf135f4 Compare July 17, 2021 22:08

mthrok reviewed Jul 19, 2021

View reviewed changes

Add pretrained WaveRNN to docs

e53390b

yangarbiter force-pushed the pretrained_wavernn branch from cf135f4 to e53390b Compare July 19, 2021 16:05

mthrok reviewed Jul 20, 2021

View reviewed changes

mthrok approved these changes Jul 20, 2021

View reviewed changes

update docstring

c4c7fa4

yangarbiter merged commit 8ec6b87 into pytorch:master Jul 20, 2021

nateanl pushed a commit to nateanl/audio that referenced this pull request Jul 28, 2021

Add pretrained weights for wavernn (pytorch#1612)

7395487

		return x.unsqueeze(1)


		def wavernn(pretrained: bool = True, progress: bool = True, **kwargs: Any) -> WaveRNN:

	- wavernn_10k_epochs_8bits_ljspeech:
	- ``"wavernn_10k_epochs_8bits_ljspeech"``

Add pretrained weights for wavernn #1612

Add pretrained weights for wavernn #1612

Uh oh!

Conversation

yangarbiter commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PetrochukM commented Jun 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Jul 2, 2021

Uh oh!

vincentqb commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented Jul 6, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PetrochukM commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentqb commented Jul 9, 2021

Uh oh!

PetrochukM commented Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Jul 9, 2021

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

dongreenberg commented Jul 16, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PetrochukM commented Jul 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongreenberg commented Jul 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yangarbiter commented Jun 25, 2021 •

edited

Loading

PetrochukM commented Jun 26, 2021 •

edited

Loading

vincentqb commented Jul 6, 2021 •

edited

Loading

PetrochukM commented Jul 6, 2021 •

edited

Loading

PetrochukM commented Jul 9, 2021 •

edited

Loading

PetrochukM commented Jul 19, 2021 •

edited

Loading