-
Notifications
You must be signed in to change notification settings - Fork 739
Refactor text preprocessing tests in Tacotron2 example #1641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
99c35bc to
ea587d4
Compare
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please make sure that the tests are properly running in CI.
|
|
||
|
|
||
| class TestTextPreprocessor(unittest.TestCase): | ||
| class TestTextPreprocessor(PytorchTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TorchaudioTestCase is slightly preferred as it will prevent accidental use of I/O functions. (which is not the case for text utils)
db3ff88 to
4fcb1d8
Compare
4fcb1d8 to
f7acad6
Compare
f7acad6 to
52436be
Compare
A simple approach is to remove the annotation. It's okay as these are helper functions in rather simple module. Another approach (I would take) is to rewrite the helper functions with def normalize_numbers(text: str) -> str:
text = _remove_commas(text)
text = _replace_pounds(text)
text = _expand_dollars(text)
text = _expand_decimal_point(text)
text = _expand_ordinal(text)
text = _expand_number(text)
return text |
Thanks for the suggestion, this is definitely more readable. I've refactored it accordingly here. |
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new signature, it is easy to test individual helper functions.
As a bonus, you can add some unit tests for them.
Regex is one of the hardest to read/maintain. So, testing helper function helps a lot for the future maintainers.
|
|
||
|
|
||
| def _remove_commas(text: str) -> str: | ||
| _comma_number_re = re.compile(r'([0-9][0-9\,]+[0-9])') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is same as directly passing the r-string to re.sub, (or worth because it surely compiles the pattern every time this function is called) as the expression is not compiled on module level. (same goes to the other uses of re.compile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I've move these expressions back to the module level.
|
|
||
|
|
||
| def _expand_dollars(text: str) -> str: | ||
| def _helper_fn(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this _helper_fn is not referring any local variable in _expand_dollars, there is no need to nest the function here. Rather put _helper_fn on module level as a plain function, and give a descriptive name improves readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. I've moved them out and call them the replacement function (based on here).
b9422ac to
d22b1aa
Compare
|
I've also added several tests in |
Co-authored-by: moto <[email protected]>
The tests for
example/pipeline_tacotron2are moved totest/torchaudio_unittest/example/tacotron2in #1625. This PR also moves the text preprocessing tests intotest/torchaudio_unittest/tacotron2so it will be run automatically by CI.