Skip to content

Conversation

@yangarbiter
Copy link
Contributor

@yangarbiter yangarbiter commented Jul 26, 2021

The tests for example/pipeline_tacotron2 are moved to test/torchaudio_unittest/example/tacotron2 in #1625. This PR also moves the text preprocessing tests into test/torchaudio_unittest/tacotron2 so it will be run automatically by CI.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please make sure that the tests are properly running in CI.



class TestTextPreprocessor(unittest.TestCase):
class TestTextPreprocessor(PytorchTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TorchaudioTestCase is slightly preferred as it will prevent accidental use of I/O functions. (which is not the case for text utils)

@yangarbiter yangarbiter force-pushed the text_util_tests branch 3 times, most recently from db3ff88 to 4fcb1d8 Compare July 27, 2021 22:58
@yangarbiter
Copy link
Contributor Author

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?

Thanks.

@mthrok
Copy link
Contributor

mthrok commented Jul 28, 2021

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?

Thanks.

A simple approach is to remove the annotation. It's okay as these are helper functions in rather simple module.

Another approach (I would take) is to rewrite the helper functions with str -> str signature. That way, the helper functions are more readable and the code will become more maintainable.

def normalize_numbers(text: str) -> str:
    text = _remove_commas(text)
    text = _replace_pounds(text)
    text = _expand_dollars(text)
    text = _expand_decimal_point(text)
    text = _expand_ordinal(text)
    text = _expand_number(text)
    return text

@yangarbiter
Copy link
Contributor Author

yangarbiter commented Jul 28, 2021

@mthrok it appears that python 3.6 does not yet have the class re.Match so the type annotation here will cause the test to fail. Do you have any suggestion on how to fix this?
Thanks.

A simple approach is to remove the annotation. It's okay as these are helper functions in rather simple module.

Another approach (I would take) is to rewrite the helper functions with str -> str signature. That way, the helper functions are more readable and the code will become more maintainable.

def normalize_numbers(text: str) -> str:
    text = _remove_commas(text)
    text = _replace_pounds(text)
    text = _expand_dollars(text)
    text = _expand_decimal_point(text)
    text = _expand_ordinal(text)
    text = _expand_number(text)
    return text

Thanks for the suggestion, this is definitely more readable. I've refactored it accordingly here.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new signature, it is easy to test individual helper functions.
As a bonus, you can add some unit tests for them.

Regex is one of the hardest to read/maintain. So, testing helper function helps a lot for the future maintainers.

https://regex101.com/r/l6gnIg/1



def _remove_commas(text: str) -> str:
_comma_number_re = re.compile(r'([0-9][0-9\,]+[0-9])')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is same as directly passing the r-string to re.sub, (or worth because it surely compiles the pattern every time this function is called) as the expression is not compiled on module level. (same goes to the other uses of re.compile)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I've move these expressions back to the module level.



def _expand_dollars(text: str) -> str:
def _helper_fn(m):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this _helper_fn is not referring any local variable in _expand_dollars, there is no need to nest the function here. Rather put _helper_fn on module level as a plain function, and give a descriptive name improves readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I've moved them out and call them the replacement function (based on here).

@yangarbiter
Copy link
Contributor Author

yangarbiter commented Jul 28, 2021

https://regex101.com/r/l6gnIg/1

I've also added several tests in test/torchaudio_unittest/example/tacotron2/test_text_preprocessing.py for individual helper functions.
Thanks.

@yangarbiter yangarbiter changed the title Move text preprocessing tests into unittest directory Refactor text preprocessing tests in Tacotron2 example Jul 28, 2021
@yangarbiter yangarbiter merged commit e14a2e0 into pytorch:master Jul 28, 2021
@yangarbiter yangarbiter deleted the text_util_tests branch July 28, 2021 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants