Skip to content

Unicode Error #17

@gwohlgen

Description

@gwohlgen

Hi,
thanks for fixing the import error!

tried to run your sample code now,
but still errors.

a) pythainlp/pythainlp/test/init.py", line 36 -- missing closing paranthesis .. easy to fix

but now:

[gerhard@localhost pythainlp]$ python test_gerhard.py
/home/gerhard/pythainlp/pythainlp/segment/dict.py:23: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if string == "":
Traceback (most recent call last):
  File "test_gerhard.py", line 6, in <module>
    b = segment(a)
  File "/home/gerhard/pythainlp/pythainlp/segment/dict.py", line 10, in segment
    result = tokenize(string, lines, "")
  File "/home/gerhard/pythainlp/pythainlp/segment/dict.py", line 27, in tokenize
    if string.startswith(pref):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

when using:

# -*- coding: utf-8 -*-

# ตัดคำ
from pythainlp.segment import segment
a = 'ฉันรักภาษาไทยเพราะฉันเป็นคนไทย'
b = segment(a)

I am not sure if this is a problem with my system, or general one ..

Cheers, Gerhard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions