-
Notifications
You must be signed in to change notification settings - Fork 285
Add provinces #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add provinces #458
Conversation
|
Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-07-23 03:04:52 UTC |
pythainlp/corpus/common.py
Outdated
|
|
||
| return _THAI_THAILAND_PROVINCES | ||
|
|
||
| def provinces_all() -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can combine this function with provinces()?
To me, because these two functions are quite similar, we can have an argument letting the user to choose which list he/she wants.
For example, the signatures could be: provinces() and provinces(details=True).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky here.
provinces() was designed to populate a set of province names in the first run (when _THAI_THAILAND_PROVINCES is still empty, it will read the names from a file with a name _THAI_THAILAND_PROVINCES_FILENAME) and for any other subsequent call it will just return _THAI_THAILAND_PROVINCES - to save processing time and avoid IO operation).
The modification breaks this.
May be we can modify the function in this way:
- Has only one file (the CSV one) of Thai names and romanized names (province Ministry of Interior code, province abbreviation, province postal code, etc.)
- In the first call, populate as set (or sorted list?) of name pairs to
_THAI_THAILAND_PROVINCES. - In any other subsequent call, just use the existing
_THAI_THAILAND_PROVINCES. - The return type/format will be determined by value in
details=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bact OK. I updated code.
|
This is LGTM. I haven't checked the documentation yet. Should we merge it? |
|
Sorry for being so late, I was quite sick in the past weeks. I made some code cleaning, fix the types, and remove the .txt file as the information there is duplicated with ones in .csv. One question here please, what's the meaning of these field names in dict?
I propose:
will this make sense? Please continue discussion in #466 |
I add provinces name of Thailand. (Thai and romanize)
(will resolve #467)