Add LST20 Part-Of-Speech tagger model #464

wannaphong · 2020-08-11T12:49:08Z

LST20 Corpus from National Electronics and Computer Technology Center, Thailand. It can download dataset from https://aiforthai.in.th/corpus.php.

Support Model

Unigram Part-Of-Speech tagger
Perceptron Part-Of-Speech tagger

and tag map for LST20 to Universal Dependencies.

Model train by Mr.Wannaphong Phatthiyaphaibun

Model License : CC-0

Code

Model File : https://github.com/PyThaiNLP/pythainlp-corpus/releases/tag/lst20-v0.2
Train script : https://github.com/PyThaiNLP/pythainlp_notebook/blob/master/postag/train_lst20_pythainlp.ipynb
Google Colab : https://colab.research.google.com/drive/1Fpp1iCdx1mGcnAOTnU2O7QwY5IsqHHtW?usp=sharing

TODO

document

pep8speaks · 2020-08-11T12:49:12Z

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-20 10:32:19 UTC

coveralls · 2020-08-11T12:55:33Z

Coverage increased (+0.2%) to 95.222% when pulling 557dd7c on add-LST20-postag into e6e01c5 on dev.

wannaphong · 2020-08-11T13:29:31Z

I doing train new model. I will combine eval with train.

wannaphong · 2020-08-11T13:39:16Z

I doing train new model. I will combine eval with train.

Done

p16i · 2020-08-12T07:13:34Z

It would be nice if we have some comparisons between these taggers and also the existing ones we have in PyThaiNLP.

pythainlp/tag/perceptron.py

Use list comprehension in _orchid_to_ud and _lst20_to_ud

Use list comprehension for _postag_clean

bact

Looks good

pythainlp/tag/pos_tag.py

…modules (lst20/orchid) to tagger submodules (unigram/perceptron).

bact · 2020-08-18T17:36:47Z

I refactor some of the code, move tagger-related stuffs away from the corpus data files (lst20.py and unigram.py) and put them to tagger files (perceptron.py and unigram.py)

Add LST20 postag model

90436b6

wannaphong changed the title ~~Add LST20 Part-Of-Speech tagger model~~ [WIP] Add LST20 Part-Of-Speech tagger model Aug 11, 2020

wannaphong added 3 commits August 12, 2020 11:37

Add lst20_tag_signs and lst20_tag_to_text

286e5b5

Add lst20_ud

60ece5b

Add pos_tag_sents docs

008f1c1

wannaphong added this to the 2.3 milestone Aug 13, 2020

wannaphong added 4 commits August 14, 2020 16:50

Add docs

7e36766

Update tag.rst

9f58ea7

Update tag.rst

d39f7b0

Update tag.rst

67393f5

wannaphong changed the title ~~[WIP] Add LST20 Part-Of-Speech tagger model~~ Add LST20 Part-Of-Speech tagger model Aug 14, 2020

wannaphong requested review from bact and p16i and removed request for bact August 14, 2020 09:54

Update tag.rst

46b7416

bact reviewed Aug 17, 2020

View reviewed changes

pythainlp/tag/perceptron.py Show resolved Hide resolved

bact added 4 commits August 17, 2020 11:54

Update pos_tag.py

be8f432

Use list comprehension in _orchid_to_ud and _lst20_to_ud

Update perceptron.py

37bcfe4

Use list comprehension for _postag_clean

Fix typo and format code

0841183

Edit function docstring

c42b1c9

bact approved these changes Aug 17, 2020

View reviewed changes

pythainlp/tag/pos_tag.py Outdated Show resolved Hide resolved

bact added 3 commits August 17, 2020 16:28

Use list comprehension

ef2e39e

Add test cases

7e18f2c

Use list comprehension

cdf8a78

bact added 7 commits August 18, 2020 07:58

More test cases for corpus.core

ce7738c

Fix test requests case

cbfe40a

Test get_corpus_path() with non-existing corpus name

9e46931

Simplify pos_tag()

8c25f46

Refactor, move tagger related functions/constants from the corpus sub…

61d301a

…modules (lst20/orchid) to tagger submodules (unigram/perceptron).

Fix tagger filename

b480568

Clean unigram pos data, minify json, rename corpus filenames

9b3b1bd

bact added 2 commits August 18, 2020 23:21

Update model names

339aa0b

Update word lists

10c1e82

bact added the enhancement enhance functionalities label Aug 19, 2020

bact added 5 commits August 19, 2020 21:55

Add test cases for _ud

e59def9

Fix PEP8

3422453

Refactor

39941dc

Refactor

e56aa62

Improve docstring

557dd7c

bact merged commit 44a818e into dev Aug 20, 2020

bact mentioned this pull request Aug 20, 2020

PyThaiNLP 2.3 change log #445

Closed

bact deleted the add-LST20-postag branch August 23, 2020 09:10

wannaphong mentioned this pull request Apr 4, 2021

PyThaiNLP v2.3.1 #548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LST20 Part-Of-Speech tagger model #464

Add LST20 Part-Of-Speech tagger model #464

Uh oh!

wannaphong commented Aug 11, 2020 •

edited

Loading

Uh oh!

pep8speaks commented Aug 11, 2020 •

edited

Loading

Uh oh!

coveralls commented Aug 11, 2020 •

edited

Loading

Uh oh!

wannaphong commented Aug 11, 2020

Uh oh!

wannaphong commented Aug 11, 2020

Uh oh!

p16i commented Aug 12, 2020

Uh oh!

Uh oh!

bact left a comment

Uh oh!

Uh oh!

bact commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add LST20 Part-Of-Speech tagger model #464

Add LST20 Part-Of-Speech tagger model #464

Uh oh!

Conversation

wannaphong commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-08-20 10:32:19 UTC

Uh oh!

coveralls commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wannaphong commented Aug 11, 2020

Uh oh!

wannaphong commented Aug 11, 2020

Uh oh!

p16i commented Aug 12, 2020

Uh oh!

Uh oh!

bact left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bact commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wannaphong commented Aug 11, 2020 •

edited

Loading

pep8speaks commented Aug 11, 2020 •

edited

Loading

coveralls commented Aug 11, 2020 •

edited

Loading