Hi, I have a question about implemention of newmm tokenizer.
https://github.com/PyThaiNLP/pythainlp/blob/e3a01772f1dbe578e81119214d85226c0cbde466/pythainlp/tokenize/newmm.py#L38C1-L46C2
Here, why not just do as "permit only Thai characters"?
I am having a troble that sometimes signs are included in the tokens.
Ex. "ถ้าไม่รังเกียจสีหน้า(รถ)" -> ถ้า / ไม่รังเกียจ / สีหน้า / (รถ) //"รถ" is in the dictionary used
Also, if this is "Dictionary-based maximal matching word segmentation", why it didn't took just "รถ" ?