-
Notifications
You must be signed in to change notification settings - Fork 285
Closed
Labels
documentationimprove documentation and test casesimprove documentation and test cases
Milestone
Description
Schedule
- First development release: 16 March 2021
- Beta release: 23 March 2021
- Production release: 30 March 2021
Docs: https://pythainlp.github.io/docs/2.3/index.html
See 2.3 Milestone.
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- Add model option to attacut.tokenize() #484 Add: model option for
attacut.tokenize()
- Add create_wordlist #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - Add NERCut tokenization engine #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- Remove instances with [ or ] from etcc.txt #449 Fix: remove instances with
[
or]
from etcc.txt - corpus.common.provinces() with details option #467 Add:
corpus.common.provinces()
can now return romanized names - Add family names #476 Add:
thai_family_names()
to get a set of Thai family names - Fix: Not found thailand_provinces_th.csv #486 #487 Fix:
thailand_provinces_th.csv
not found issue - Update tag.rst #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagging
- Add LST20 Part-Of-Speech tagger model #464 Add:
LST20
language model for part-of-speech tagging - perception/orchid_ud pos tagger model still require nltk to load #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - Update ORCHID POS tags Docs #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- Update ThaiNER 1.4 to ThaiNER 1.5 #526 Update: ThaiNER 1.4 to ThaiNER 1.5
- Add ThaiNameTagger version #538 Add: ThaiNameTagger version and add ThaiNER 1.4 support
Transliteration
- Romanize failed in some examples #485 Fix: romanize failed in some examples
- Add Thai W2P #511 Add:
Thai W2P
(Thai Word-to-Phoneme converter)
Text summarization
- Add mT5 text summarize #523 Add:
mT5
text summarize topythainlp.summarize
Chunk parser
- Add pythainlp.tag.chunk #524 Add:
pythainlp.tag.chunk
Util
- Fix remove_repeat_vowels() bug that remove spaces between vowel #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - Add method to remove a word from trie #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - thai_strftime: Normalize output for unsupported directive #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - Add emoji convert #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - Add thai keyboard distance #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
wangchanberta
- Add wangchanberta #540 Add: wangchanberta (pythainlp.wangchanberta)
Metadata
Metadata
Assignees
Labels
documentationimprove documentation and test casesimprove documentation and test cases