Skip to content

Conversation

bact
Copy link
Member

@bact bact commented Jul 2, 2020

etcc.txt that used to created a dictionary for Enhanced Thai Character Cluster tokenization has [ and ] characters, where it shouldn't.

Removed all of them.

@bact bact requested a review from wannaphong July 2, 2020 10:41
@bact bact self-assigned this Jul 2, 2020
@bact bact added bug bugs in the library corpus corpus/dataset-related issues labels Jul 2, 2020
@bact bact added this to the 2.3 milestone Jul 2, 2020
@coveralls
Copy link

coveralls commented Jul 2, 2020

Coverage Status

Coverage remained the same at 95.01% when pulling 908a800 on fix-etcc-dict into 27ee248 on dev.

@bact bact closed this Jul 4, 2020
@bact bact reopened this Jul 4, 2020
@bact bact merged commit 32ab026 into dev Jul 4, 2020
@bact bact mentioned this pull request Jul 4, 2020
@bact bact deleted the fix-etcc-dict branch July 4, 2020 22:57
@wannaphong wannaphong mentioned this pull request Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug bugs in the library corpus corpus/dataset-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants