Skip to content

Conversation

@jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 24, 2018

# set sys.defaultencoding to ascii, then change it back after the test
enc = sys.getdefaultencoding()
reload(sys) # noqa:F821
sys.setdefaultencoding('ascii')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a context manager this i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a context manager for locale in pd.util.testing. Can that be used here or do you have something else in mind? (I agree it would be prettier)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls use that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tm.set_locale doesn't change sys.getdefaultencoding(). I could make a new contextmanager specifically for this (which I guess would be a no-op in py3?)

@codecov
Copy link

codecov bot commented May 24, 2018

Codecov Report

Merging #21187 into master will decrease coverage by 0.01%.
The diff coverage is 11.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21187      +/-   ##
==========================================
- Coverage   91.92%    91.9%   -0.02%     
==========================================
  Files         153      153              
  Lines       49563    49572       +9     
==========================================
+ Hits        45559    45560       +1     
- Misses       4004     4012       +8
Flag Coverage Δ
#multiple 90.3% <11.11%> (-0.02%) ⬇️
#single 41.8% <11.11%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 85.27% <11.11%> (-0.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b36b451...7ab9324. Read the comment docs.


class TestCategoricalRepr(object):

@pytest.mark.skipif(compat.PY3, reason="Decoding failure only in PY2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a test for py3 as well that uses utf8 as the encoding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, couldn't hurt.

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string labels May 24, 2018
- Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)
- Bug in :func:`pandas.testing.assert_index_equal` which raised ``AssertionError`` incorrectly, when comparing two :class:`CategoricalIndex` objects with param ``check_categorical=False`` (:issue:`19776`)
- Bug in rendering :class:`Series` with ``Categorical`` dtype in rare conditions under Python 2.7 (:issue:`21002`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.23.2

# GH#21002 if len(index) > 60, sys.getdefaultencoding()=='ascii',
# and we are working in PY2, then rendering a Categorical could raise
# UnicodeDecodeError by trying to decode when it shouldn't
from pandas.core.base import StringMixin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can import at the top

str(ser)

else:
# set sys.defaultencoding to ascii, then change it back after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this into a context manager in pandas.util.testing

@jreback jreback added this to the 0.23.2 milestone Jun 21, 2018
@jreback jreback merged commit e24da6c into pandas-dev:master Jun 21, 2018
@jreback
Copy link
Contributor

jreback commented Jun 21, 2018

thanks!

and soon enough we will blow away all PY2 code in any event! but nice patch.

@jbrockmendel jbrockmendel deleted the unicat branch June 22, 2018 03:32
jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Output-Formatting __repr__ of pandas objects, to_string

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rendering Series[Categorical] raises UnicodeDecodeError

3 participants