Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Jan 13, 2014

#5142

@JanSchulz , can you test whether this solves the problem for you?

@jorisvandenbossche
Copy link
Member

Maybe this can solve the windows building issue (I will also test), but aside: do we want this in the docs? Because the example by itself does work, it's only the building that does not work (as far as I understand).

@ghost
Copy link
Author

ghost commented Jan 13, 2014

I think we want users/contributors to be able to build the docs, yeah. Even if they're on windows/diff locale
Do you feel the workaround clutter detracts much from the example?

It took a lot of effort to get pandas to play nice wth unicode and one lesson learned is not
to mix encodings. The docs should be utf8-clean IMO.

@jorisvandenbossche
Copy link
Member

I tried it, and it does not solve the issue. And in retrospect, that is maybe also logical: the problem in windows is in the building of the rst with unicode to html, and it is the output generated by the code example which causes this. With your changes, the output of the code example still contains special characters (which is also the point of the code example), and so causes the build on windows to stop.

I think @JanSchulz had another approach as a kind of hack: something along the lines of #5142 (comment). I also vaguely remember that the issue was fixed when using ipython's version of the ipython directive, but I should check that.

@ghost
Copy link
Author

ghost commented Jan 13, 2014

Then I misunderstood the issue. There is a definite difference:

s1='word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'
s2=s1.decode('utf8').encode('latin-1')

s1.decode('utf8')
Out[33]: u'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'

s2.decode('utf8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-34-7c1601a98c33> in <module>()
----> 1 s2.decode('utf8')

/usr/lib64/python2.7/encodings/utf_8.pyc in decode(input, errors)
     14 
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17 
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14: invalid continuation byte

and in the case of python's "encoding utf8" premable that makes all the difference.
I expected sphinx to accept utf8 input, if it doesn't that seems like a bug to me.

Thanks for testing.

@ghost ghost closed this Jan 13, 2014
@ghost ghost deleted the PR_GH5142 branch January 13, 2014 20:41
@ghost
Copy link
Author

ghost commented Jan 13, 2014

btw, there was some decode action in our hacked version of ipython_directive, #5925 may actually solve
the problem by sheer coincidence.

@jorisvandenbossche
Copy link
Member

See also here #5142 (comment). There was indeed a .decode('utf8') in our version of the ipython directive for some other reason, but that broke the building on windows.

@jorisvandenbossche
Copy link
Member

I will try out the other PR with your rebase.

@ghost
Copy link
Author

ghost commented Jan 13, 2014

I'm seriously skimming past all the important bits today :), sorry.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant