-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Code Sample
df = pd.DataFrame({
'a': pd.Series(list('abc')),
'b': pd.Series(pd.to_datetime(['2018-01-01', '2018-02-01', '2018-03-01']), dtype='category'),
'c': pd.Categorical.from_codes([-1, 0, 1], categories=[0, 1])
})
df.groupby(['a', 'b']).indicesProblem description
Tossing an error. You can play around with difference choices of columns but this happens so long as you include 'b' with one of the other columns. 'b' on its own is okay.
>> df.groupby(['a', 'b']).indices
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-c4de90de974e> in <module>
1 gb = df.groupby(['a', 'b'])
----> 2 gb.indices
/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/groupby.py in indices(self)
401 """
402 self._assure_grouper()
--> 403 return self.grouper.indices
404
405 def _get_indices(self, names):
pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()
/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/ops.py in indices(self)
204 keys = [com.values_from_object(ping.group_index)
205 for ping in self.groupings]
--> 206 return get_indexer_dict(label_list, keys)
207
208 @property
/opt/conda/lib/python3.6/site-packages/pandas/core/sorting.py in get_indexer_dict(label_list, keys)
331 group_index = group_index.take(sorter)
332
--> 333 return lib.indices_fast(sorter, group_index, keys, sorted_labels)
334
335
pandas/_libs/lib.pyx in pandas._libs.lib.indices_fast()
TypeError: Cannot convert DatetimeIndex to numpy.ndarray
Expected Output
Not an error.
Cause
If we inspect, BaseGrouper.indices, we see that keys gets passed to get_indexer_dict here:
pandas/pandas/core/groupby/ops.py
Lines 227 to 235 in 430f0fd
| def indices(self): | |
| """ dict {group name -> group indices} """ | |
| if len(self.groupings) == 1: | |
| return self.groupings[0].indices | |
| else: | |
| label_list = [ping.labels for ping in self.groupings] | |
| keys = [com.values_from_object(ping.group_index) | |
| for ping in self.groupings] | |
| return get_indexer_dict(label_list, keys) |
get_indexer_dict eventually passes the elements of keys to get_value_at found here:
Lines 94 to 99 in 2b32e41
| cdef inline object get_value_at(ndarray arr, object loc): | |
| cdef: | |
| Py_ssize_t i | |
| i = validate_indexer(arr, loc) | |
| return arr[i] |
The problem is that to build keys, the get_values method is called on each group index (you can see in BaseGrouper.indices how this isn't an issue when there's a single grouper). When grouping on a categorical-datetime column like df['b'], the get_values method on the underlying categorical array is called and within that method this branch of the if statement is triggered, causing a DatetimeIndex to be returned instead of a numpy array.
pandas/pandas/core/arrays/categorical.py
Line 1504 in 2b32e41
| return self.categories.take(self._codes, fill_value=np.nan) |
Solution
Now, it states in the Categorical.get_values doc string that an Index object could be return and not a numpy array. The simplest thing is to just introduce a line like this before get_indexer_dict
keys = [np.array(key) for key in keys]
A pull request for this will be created imminently.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.14.3
scipy: 1.2.1
pyarrow: 0.13.0
xarray: None
IPython: 7.4.0
sphinx: 2.0.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: 0.7.0
gcsfs: None