Skip to content

Conversation

@mthrok
Copy link
Contributor

@mthrok mthrok commented Oct 22, 2021

Add Spanish ASR from Voxpopuli.

@mthrok mthrok force-pushed the pretrain-es branch 2 times, most recently from 0938d5e to 66c98e4 Compare October 23, 2021 01:04
@mthrok mthrok marked this pull request as ready for review October 23, 2021 01:05
)


def _get_es_labels():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal since this is used internally only, but since we are adding several languages, thoughts on having a dictionary mapping lang -> symbols and using generic function _get_labels(lang) instead of a having a different function for each added language?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr sure we can do that.


My original intention here was to have the separate label object instance object for different pipeline, (meaning, id function would report different values for each get_labels) because labels would be global object and modifying one would not affect the others.

However, now that get_labels returns tuple, which is immutable, and is constructed at runtime, it is no longer relevant. so we can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so I tried adopting the idea, and I think having them as separate function is better for the readability.

The follwing is only for French, but each language will have around 30 lines. So at the end the dictionary will be around 13 * 30 lines.

def _get_voxpopuli_labels(lang):
    labels = {
        'fr': (
            "|",
            "e",
            "s",
            "n",
            "i",
            "t",
            "r",
            "a",
            "o",
            "u",
            "l",
            "d",
            "c",
            "p",
            "m",
            "é",
            "v",
            "q",
            "f",
            "g",
            "b",
            "h",
            "x",
            "à",
            "j",
            "è",
            "y",
            "ê",
            "z",
            "ô",
            "k",
            "ç",
            "œ",
            "û",
            "ù",
            "î",
            "â",
            "w",
            "ï",
            "ë",
            "ü",
            "æ",
        )
    }
    return labels[lang]

@mthrok
Copy link
Contributor Author

mthrok commented Oct 25, 2021

@nateanl

I realized that the last dimension of the ASR output label is 1, which only happened once in the original training dataset. I believe this is a sort of mistake, and we can exclude such dimension from the model like the case of <pad> (#1914).

What do you think?

cat dict.es_char.txt
| 1518049
e 1077702
a 877225
o 694767
s 619500
n 565808
r 519910
i 513128
l 393015
d 382689
c 359006
t 358015
u 323059
p 236137
m 235987
b 88865
q 85869
y 78375
g 77028
v 67765
h 62163
ó 58918
f 53486
í 36803
á 30274
j 25768
z 22749
ñ 19116
é 16605
x 11350
ú 7163
k 1741
w 484
ü 256
1 1        <<--

@mthrok mthrok merged commit 3a59931 into pytorch:main Oct 27, 2021
@mthrok mthrok deleted the pretrain-es branch October 27, 2021 02:06
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants