- 
                Notifications
    You must be signed in to change notification settings 
- Fork 286
Labels
bugbugs in the librarybugs in the library
Description
Describe the bug
I've observed a behavior that is worth being discussed here. In short, when there are some punctuations, syllable tokenizes would return some incorrect syllables, both from the default engine and ssg.
To Reproduce
Please see: https://colab.research.google.com/drive/12gxSmskjHCQzqV1-Nb4IOaBD-LJ0ARl5?usp=sharing
Expected behavior
imho, the expected result is ['หน้า', 'ที่', ' ', '19', '...'] . To achieve this, we can split the sentence by punctuation first then do syllable tokenization for each part.
What do you think?
bact
Metadata
Metadata
Assignees
Labels
bugbugs in the librarybugs in the library
