Skip to content

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Jan 18, 2021

What does this changes

Update ThaiNER 1.4 to ThaiNER 1.5

GitHub: https://github.com/wannaphong/thai-ner/tree/master/model/1.5

File model: https://github.com/wannaphong/thai-ner/releases/download/1.5/thai-ner-1-5-newmm-lst20.crfsuite

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Passed code styles and structures
  • Passed code linting checks and unit test

@wannaphong wannaphong changed the title Update ThaiNER 1.4 to ThaiNER 1.5 [WIP] Update ThaiNER 1.4 to ThaiNER 1.5 Jan 29, 2021
@wannaphong
Copy link
Member Author

wannaphong commented Feb 17, 2021

Thai NER

v1.5

Model Details

  • Developer: Wannaphong Phatthiyaphaibun
  • Model date: 2021-1-16
  • Model version: 1.5
  • Used in PyThaiNLP version: 2.3 +
  • Filename: ~/pythainlp-data/thai-ner-1-5-newmm-lst20.crfsuite
  • CRF Model
  • License: CC0
  • GitHub for Thai NER 1.5 (Data and train notebook): thai-ner-1-5-newmm-lst20.ipynb https://github.com/wannaphong/thai-ner/tree/master/model/1.5

Intended Use

  • Named-Entity Tagging for Thai.
  • Not suitable for other language or non-news domain.

Factors

  • Based on known problems with thai natural Language processing.

Metrics

  • Evaluation metrics include precision, recall and f1-score.

Training Data
ThaiNER 1.5 Corpus Train set (5089 sent)

Evaluation Data
ThaiNER 1.5 Corpus Test set (1274 sent)

Quantitative Analyses

                precision    recall  f1-score   support

        B-DATE       0.93      0.82      0.87       350
        I-DATE       0.95      0.94      0.95       665
         B-LAW       0.85      0.54      0.66        87
         I-LAW       0.85      0.64      0.73       253
         B-LEN       1.00      0.75      0.86        12
         I-LEN       1.00      0.69      0.82        26
    B-LOCATION       0.81      0.70      0.75       620
    I-LOCATION       0.74      0.72      0.73       533
       B-MONEY       1.00      0.91      0.95       131
       I-MONEY       0.99      0.95      0.97       321
B-ORGANIZATION       0.92      0.70      0.80      1334
I-ORGANIZATION       0.80      0.73      0.76      1198
     B-PERCENT       0.94      0.88      0.91        17
     I-PERCENT       0.91      0.95      0.93        22
      B-PERSON       0.96      0.78      0.86       607
      I-PERSON       0.94      0.88      0.91      2181
       B-PHONE       1.00      0.50      0.67         2
       I-PHONE       1.00      1.00      1.00         8
        B-TIME       0.93      0.66      0.77        87
        I-TIME       0.97      0.77      0.86       158
         B-URL       0.91      0.83      0.87        12
         I-URL       0.93      0.96      0.94        94

     micro avg       0.89      0.79      0.84      8718
     macro avg       0.92      0.79      0.84      8718
  weighted avg       0.90      0.79      0.84      8718
   samples avg       0.16      0.16      0.16      8718

Ethical Considerations
no ideas

Caveats and Recommendations

  • Thai text only

@wannaphong wannaphong changed the title [WIP] Update ThaiNER 1.4 to ThaiNER 1.5 Update ThaiNER 1.4 to ThaiNER 1.5 Feb 17, 2021
@wannaphong wannaphong added this to the 2.3 milestone Feb 17, 2021
@pep8speaks
Copy link

pep8speaks commented Feb 17, 2021

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-18 08:56:06 UTC

@wannaphong wannaphong merged commit f631818 into dev Feb 19, 2021
@wannaphong wannaphong deleted the thainer-1.5 branch March 15, 2021 20:23
@wannaphong wannaphong mentioned this pull request Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants