Skip to content

Add PhayaThaiBERT model into PyThaiNLP [WIP] #868

@pavaris-pm

Description

@pavaris-pm

Due to an impressive result of the new released paper PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords for better handling with foreign words compared to former existing Thai encoder-based model.

I think it is great to add it into supported downstream task of PyThaiNLP e.g. token classification etc. to strengthen the library. What do you think? If all of us agreed on this, I can help integrating it as a new engine asap.

New features

Here is the task that I found that it can be integrated in PyThaiNLP after reading a paper. The list below here is the current progress and contributors who put their efforts develop the model ( ✅ check mark means that it already added in the source code and will make a complete PR after complete all of it krub):

  • Part-of-speech tagging on blackboard corpus by @MpolaarbearM
  • Named-entity-recognition on Thainer-v2 corpus by @pavaris-pm
  • Tokenization by @pavaris-pm
  • Data Augmentation (Text) by @pavaris-pm
  • Word Correction (currently under research and development)

etc ... (I will keep add more into the list based on what I have found during an experiment)

For those who interested, feel free to leave a comment below in case you want to develop a model in any of your interested task krub. After that, you can made a PR to the same brach as in PR #873

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementenhance functionalities

    Type

    No type

    Projects

    Status

    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions