Skip to content

Conversation

@yangarbiter
Copy link
Contributor

@yangarbiter yangarbiter commented Jun 25, 2021

Closes #776.

Offers pertained weight for WaveRNN with 8 bits waveform mode and trained on LJSpeech.

Following the convention here.

@yangarbiter yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 25cf920 to 91ca4bd Compare June 25, 2021 21:15
@PetrochukM
Copy link

PetrochukM commented Jun 26, 2021

Hi!

I'm a bit worried that we're moving forward without explicit consent from Linda Johnson. Before her voice becomes easily accessible via torchaudio, it might be worthwhile to get a written confirmation from her.

I'm particularly worried because there are a lot of issues to consider:

  • What if Linda Johnson goes to trial? Could this model be used to falsify evidence?
  • What happens if Linda Johnson dies? Will her Family be traumatized by hearing her voice used by the public?
  • Could someone impersonate Linda Johnson using her voice? Will that get her in trouble?
  • This voice could be used to defraud the elderly via scam calls. Is Linda Johnson okay with that?
  • Linda Johnson's voice will be easily accessible via PyTorch's official documentation. Has Linda Johnson considered if she wants that kind of exposure?

Out of respect for a fellow person, I think we should double-check with Linda Johnson before this PR is approved.

Thanks for your consideration!

(I understand that this dataset has already gotten really popular. Even so, I think we should take a step in the right direction and ask for permission before going ahead with this push to the official torchaudio repo.)

@yangarbiter yangarbiter force-pushed the pretrained_wavernn branch 7 times, most recently from 7046810 to 7d81105 Compare July 1, 2021 21:58
@mthrok
Copy link
Contributor

mthrok commented Jul 2, 2021

Hi @PetrochukM

Thanks for bringing up the issue again.
@vincentqb Can you address these concerns?

@PetrochukM I am curious to learn your opinion on publishing the pre-trained model for vocoder.
Would you think that publishing the model is risky and would not do if you were a maintainer of the repo?

@vincentqb
Copy link
Contributor

vincentqb commented Jul 6, 2021

@dongreenberg -- can you comment here? following internal: october 12

@vincentqb
Copy link
Contributor

nit: flake8 :)

return x.unsqueeze(1)


def wavernn(pretrained: bool = True, progress: bool = True, **kwargs: Any) -> WaveRNN:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's stay closer to the convention set in here: we have a helper function _wavernn that passes the kwargs to WaveRNN, and a particular one called wavernn_10k_epochs_8bits_ljspeech

Comment on lines 20 to 23
model_urls = {
'wavernn': 'https://download.pytorch.org/models/audio/wavernn_10k_epochs_8bits_ljspeech.pth',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case this line is too long:

model_urls = {
    'wavernn_10k_epochs_8bits_ljspeech': (
           'https://download.pytorch.org/models/audio/'
           'wavernn_10k_epochs_8bits_ljspeech.pth'
     ),
}

@PetrochukM
Copy link

PetrochukM commented Jul 6, 2021

@PetrochukM I am curious to learn your opinion on publishing the pre-trained model for vocoder.
Would you think that publishing the model is risky and would not do if you were a maintainer of the repo?

I think it's OKAY as long as the voice actor(s) have given their written and explicit permission (knowing all the consequences of doing so) to publish their voice.

I think it'd be really cool if torchaudio standardized asking voice actor(s) for consent and torchaudio included a short consent document along with the published model.

@yangarbiter yangarbiter force-pushed the pretrained_wavernn branch 7 times, most recently from 99f9d75 to 0fabf59 Compare July 7, 2021 17:35
'n_hidden': 128,
'n_output': 128
}
configs.update(kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow the convention in here, we should have kwargs.update(configs) or otherwise just update the dictionary directly.

@vincentqb
Copy link
Contributor

@PetrochukM -- thanks again for raising those concerns :) The previous discussion is in #776 and the author of the dataset @keithito commented here that he has personally corresponded with Linda, and confirmed that she has been very supportive of having her recordings used as the basis of a public domain speech dataset. based on this, we will go ahead and publish the pre-trained weights. however, anyone using such pre-trained models should consult their own lawyers ahead of time, in a similar fashion to the notice given here. please do let us know if you have any other concerns.
 
cc @dongreenberg

@PetrochukM
Copy link

PetrochukM commented Jul 9, 2021

@vincentqb Thanks for addressing my concerns!

The Linda Johnson dataset is now 4 years old (before Tacotron-2 was even published), so I'm worried that her comments were made a long time ago. I'm worried that this dataset has been used much more widely than originally intended.

Would it be okay if got some more clarification on the correspondence between Linda and Keith?

@mthrok
Copy link
Contributor

mthrok commented Jul 9, 2021

@vincentqb Thanks for addressing my concerns!

The Linda Johnson dataset is now 4 years old (before Tacotron-2 was even published), so I'm worried that her comments were made a long time ago. I'm worried that this dataset has been used much more widely than originally intended.

Would it be okay if got some more clarification on the correspondence between Linda and Keith?

@vincentqb Both the comment of @keithito and the internal document you are pointing are about the copy right of the data set (and the derived copy right of a model trained with the dataset). To me it looks different from the points and concerns @PetrochukM is bringing up. I do not think we should make a rushed decision to make it available as this seems very sensitive matter.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but @mthrok do you have any other feedback?

i'll also let @mthrok and @dongreenberg follow-up on comment.

@yangarbiter yangarbiter marked this pull request as ready for review July 15, 2021 16:57
@dongreenberg
Copy link

Closing the loop on this. We had an internal review and had our legal team analyze the license to see whether this is in the scope of the license, which they deem it to be.

@yangarbiter yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 4b79bda to 7c42129 Compare July 16, 2021 22:49
@yangarbiter yangarbiter force-pushed the pretrained_wavernn branch 2 times, most recently from 7d09505 to cf135f4 Compare July 17, 2021 22:08
The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py.
"""
if checkpoint_name not in _MODEL_CONFIG_AND_URLS:
raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When validating a value and there is a (small number of) finite set of valid values, listing them out is more user-friendly.

Imagine that I tried to pass wavernn_10k_epochs_8bits_ljspeech but misspelled wavernn_10k_epochs_8bits_ljspeeck. If the error message only tells me it's invalid, then I have to search for the documentation to see what is correct. If the error message also tells what are the valid choices, then I can copy-paste the valid ones from the error message and retry at instant.

not supported is correct but sounds like it's planned to be supported, and I think unexpected is more regularly used.

str.format method is fine, but typically, f-string is more readable and the code becomes shorter.

Suggested change
raise ValueError("The checkpoint_name `{}` is not supported.".format(checkpoint_name))
raise ValueError(
f"Unexpected checkpoint_name: '{checkpoint_name}'. "
f"Valid choices are; {list(_MODEL_CONFIG_AND_URLS.keys())}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. These are definitely better designs.
I've fixed them here.

Args:
checkpoint_name (str): The name of the checkpoint to load. Available checkpoints:
- wavernn_10k_epochs_8bits_ljspeech:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- wavernn_10k_epochs_8bits_ljspeech:
- ``"wavernn_10k_epochs_8bits_ljspeech"``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point it out. I've fixed them here.

@PetrochukM
Copy link

PetrochukM commented Jul 19, 2021

I wanted to check in. Are y'all going to publish Linda Johnson's voice for the public to use without asking her for explicit and informed permission?

- ``"wavernn_10k_epochs_8bits_ljspeech"``:
WaveRNN model trained with 10k epochs and 8 bits depth waveform on the LJSpeech dataset.
The model is trained using the default parameters and code of the examples/pipeline_wavernn/main.py.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongreenberg
Copy link

I wanted to check in. Are y'all going to publish Linda Johnson's voice for the public to use without asking her for explicit and informed permission?

Most definitely not.

@yangarbiter yangarbiter merged commit 8ec6b87 into pytorch:master Jul 20, 2021
nateanl pushed a commit to nateanl/audio that referenced this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants