-
Notifications
You must be signed in to change notification settings - Fork 284
Description
Due to an impressive result of the new released paper PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords for better handling with foreign words compared to former existing Thai encoder-based model.
I think it is great to add it into supported downstream task of PyThaiNLP e.g. token classification etc. to strengthen the library. What do you think? If all of us agreed on this, I can help integrating it as a new engine asap.
New features
Here is the task that I found that it can be integrated in PyThaiNLP after reading a paper. The list below here is the current progress and contributors who put their efforts develop the model ( ✅ check mark means that it already added in the source code and will make a complete PR after complete all of it krub):
- Part-of-speech tagging on blackboard corpus by @MpolaarbearM
- Named-entity-recognition on Thainer-v2 corpus by @pavaris-pm
- Tokenization by @pavaris-pm
- Data Augmentation (Text) by @pavaris-pm
- Word Correction (currently under research and development)
etc ... (I will keep add more into the list based on what I have found during an experiment)
For those who interested, feel free to leave a comment below in case you want to develop a model in any of your interested task krub. After that, you can made a PR to the same brach as in PR #873
Metadata
Metadata
Assignees
Labels
Type
Projects
Status