Skip to content

Conversation

@wannaphong
Copy link
Member

@wannaphong wannaphong commented Aug 11, 2020

LST20 Corpus from National Electronics and Computer Technology Center, Thailand. It can download dataset from https://aiforthai.in.th/corpus.php.

Support Model

  • Unigram Part-Of-Speech tagger
  • Perceptron Part-Of-Speech tagger

and tag map for LST20 to Universal Dependencies.

Model train by Mr.Wannaphong Phatthiyaphaibun

Model License : CC-0

Code

TODO

  • document

@pep8speaks
Copy link

pep8speaks commented Aug 11, 2020

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-20 10:32:19 UTC

@coveralls
Copy link

coveralls commented Aug 11, 2020

Coverage Status

Coverage increased (+0.2%) to 95.222% when pulling 557dd7c on add-LST20-postag into e6e01c5 on dev.

@wannaphong wannaphong changed the title Add LST20 Part-Of-Speech tagger model [WIP] Add LST20 Part-Of-Speech tagger model Aug 11, 2020
@wannaphong
Copy link
Member Author

I doing train new model. I will combine eval with train.

@wannaphong
Copy link
Member Author

I doing train new model. I will combine eval with train.

Done

@p16i
Copy link
Contributor

p16i commented Aug 12, 2020

It would be nice if we have some comparisons between these taggers and also the existing ones we have in PyThaiNLP.

@wannaphong wannaphong added this to the 2.3 milestone Aug 13, 2020
@wannaphong wannaphong changed the title [WIP] Add LST20 Part-Of-Speech tagger model Add LST20 Part-Of-Speech tagger model Aug 14, 2020
@wannaphong wannaphong requested review from bact and p16i and removed request for bact August 14, 2020 09:54
bact added 4 commits August 17, 2020 11:54
Use list comprehension in _orchid_to_ud and _lst20_to_ud
Use list comprehension for _postag_clean
Copy link
Member

@bact bact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@bact
Copy link
Member

bact commented Aug 18, 2020

I refactor some of the code, move tagger-related stuffs away from the corpus data files (lst20.py and unigram.py) and put them to tagger files (perceptron.py and unigram.py)

@bact bact added the enhancement enhance functionalities label Aug 19, 2020
@bact bact merged commit 44a818e into dev Aug 20, 2020
@bact bact mentioned this pull request Aug 20, 2020
@bact bact deleted the add-LST20-postag branch August 23, 2020 09:10
@wannaphong wannaphong mentioned this pull request Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement enhance functionalities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants