Refactor text preprocessing tests in Tacotron2 example #1641

yangarbiter · 2021-07-26T22:42:06Z

The tests for example/pipeline_tacotron2 are moved to test/torchaudio_unittest/example/tacotron2 in #1625. This PR also moves the text preprocessing tests into test/torchaudio_unittest/tacotron2 so it will be run automatically by CI.

mthrok

LGTM. Please make sure that the tests are properly running in CI.

mthrok · 2021-07-27T02:26:21Z

test/torchaudio_unittest/example/tacotron2/test_text_preprocessing.py



-class TestTextPreprocessor(unittest.TestCase):
+class TestTextPreprocessor(PytorchTestCase):


nit: TorchaudioTestCase is slightly preferred as it will prevent accidental use of I/O functions. (which is not the case for text utils)

yangarbiter · 2021-07-28T00:05:06Z

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?

Thanks.

mthrok · 2021-07-28T00:21:23Z

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?

Thanks.

A simple approach is to remove the annotation. It's okay as these are helper functions in rather simple module.

Another approach (I would take) is to rewrite the helper functions with str -> str signature. That way, the helper functions are more readable and the code will become more maintainable.

def normalize_numbers(text: str) -> str:
    text = _remove_commas(text)
    text = _replace_pounds(text)
    text = _expand_dollars(text)
    text = _expand_decimal_point(text)
    text = _expand_ordinal(text)
    text = _expand_number(text)
    return text

yangarbiter · 2021-07-28T04:23:04Z

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?
Thanks.

A simple approach is to remove the annotation. It's okay as these are helper functions in rather simple module.

Another approach (I would take) is to rewrite the helper functions with str -> str signature. That way, the helper functions are more readable and the code will become more maintainable.
def normalize_numbers(text: str) -> str:
    text = _remove_commas(text)
    text = _replace_pounds(text)
    text = _expand_dollars(text)
    text = _expand_decimal_point(text)
    text = _expand_ordinal(text)
    text = _expand_number(text)
    return text

Thanks for the suggestion, this is definitely more readable. I've refactored it accordingly here.

mthrok

With the new signature, it is easy to test individual helper functions.
As a bonus, you can add some unit tests for them.

Regex is one of the hardest to read/maintain. So, testing helper function helps a lot for the future maintainers.

https://regex101.com/r/l6gnIg/1

mthrok · 2021-07-28T15:10:27Z

examples/pipeline_tacotron2/text/numbers.py

+
+
+def _remove_commas(text: str) -> str:
+    _comma_number_re = re.compile(r'([0-9][0-9\,]+[0-9])')


I think this is same as directly passing the r-string to re.sub, (or worth because it surely compiles the pattern every time this function is called) as the expression is not compiled on module level. (same goes to the other uses of re.compile)

Thanks for pointing this out, I've move these expressions back to the module level.

mthrok · 2021-07-28T15:11:42Z

examples/pipeline_tacotron2/text/numbers.py

+
+
+def _expand_dollars(text: str) -> str:
+    def _helper_fn(m):


since this _helper_fn is not referring any local variable in _expand_dollars, there is no need to nest the function here. Rather put _helper_fn on module level as a plain function, and give a descriptive name improves readability.

Thanks for pointing it out. I've moved them out and call them the replacement function (based on here).

yangarbiter · 2021-07-28T18:38:04Z

https://regex101.com/r/l6gnIg/1

I've also added several tests in test/torchaudio_unittest/example/tacotron2/test_text_preprocessing.py for individual helper functions.
Thanks.

examples/pipeline_tacotron2/text/numbers.py

Co-authored-by: moto <[email protected]>

facebook-github-bot added the CLA Signed label Jul 26, 2021

yangarbiter requested a review from mthrok July 26, 2021 22:42

yangarbiter force-pushed the text_util_tests branch from 99c35bc to ea587d4 Compare July 26, 2021 22:57

mthrok approved these changes Jul 27, 2021

View reviewed changes

yangarbiter force-pushed the text_util_tests branch 3 times, most recently from db3ff88 to 4fcb1d8 Compare July 27, 2021 22:58

yangarbiter force-pushed the text_util_tests branch from 4fcb1d8 to f7acad6 Compare July 28, 2021 00:06

Move text preprocessing tests into unittest directory

52436be

yangarbiter force-pushed the text_util_tests branch from f7acad6 to 52436be Compare July 28, 2021 00:11

Refactor pipeline_tacotron2 numbers.py

4a2d89f

mthrok reviewed Jul 28, 2021

View reviewed changes

Add tests for regex for numbers

d22b1aa

yangarbiter force-pushed the text_util_tests branch from b9422ac to d22b1aa Compare July 28, 2021 18:36

mthrok reviewed Jul 28, 2021

View reviewed changes

examples/pipeline_tacotron2/text/numbers.py Outdated Show resolved Hide resolved

Apply suggestions from code review

1240eed

Co-authored-by: moto <[email protected]>

yangarbiter changed the title ~~Move text preprocessing tests into unittest directory~~ Refactor text preprocessing tests in Tacotron2 example Jul 28, 2021

yangarbiter merged commit e14a2e0 into pytorch:master Jul 28, 2021

yangarbiter deleted the text_util_tests branch July 28, 2021 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor text preprocessing tests in Tacotron2 example #1641

Refactor text preprocessing tests in Tacotron2 example #1641

Uh oh!

yangarbiter commented Jul 26, 2021 •

edited

Loading

Uh oh!

mthrok left a comment

Uh oh!

mthrok Jul 27, 2021

Uh oh!

yangarbiter commented Jul 28, 2021

Uh oh!

mthrok commented Jul 28, 2021

Uh oh!

yangarbiter commented Jul 28, 2021 •

edited

Loading

Uh oh!

mthrok left a comment

Uh oh!

mthrok Jul 28, 2021

Uh oh!

yangarbiter Jul 28, 2021

Uh oh!

mthrok Jul 28, 2021

Uh oh!

yangarbiter Jul 28, 2021

Uh oh!

yangarbiter commented Jul 28, 2021 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class TestTextPreprocessor(unittest.TestCase):
		class TestTextPreprocessor(PytorchTestCase):



		def _remove_commas(text: str) -> str:
		_comma_number_re = re.compile(r'([0-9][0-9\,]+[0-9])')

Refactor text preprocessing tests in Tacotron2 example #1641

Refactor text preprocessing tests in Tacotron2 example #1641

Uh oh!

Conversation

yangarbiter commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

mthrok Jul 27, 2021

Choose a reason for hiding this comment

Uh oh!

yangarbiter commented Jul 28, 2021

Uh oh!

mthrok commented Jul 28, 2021

Uh oh!

yangarbiter commented Jul 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

mthrok Jul 28, 2021

Choose a reason for hiding this comment

Uh oh!

yangarbiter Jul 28, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Jul 28, 2021

Choose a reason for hiding this comment

Uh oh!

yangarbiter Jul 28, 2021

Choose a reason for hiding this comment

Uh oh!

yangarbiter commented Jul 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yangarbiter commented Jul 26, 2021 •

edited

Loading

yangarbiter commented Jul 28, 2021 •

edited

Loading

yangarbiter commented Jul 28, 2021 •

edited

Loading