-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi all,
I encountered the following after upgrading to Pandas 0.21. Essentially 2 issues:
-
The problem is when I have a
Datasetwith one of thecoordsaspd.DatatimeIndex, callingDataArray.to_dataframe()produced apd.DataFramewithMultiIndex, rather thanDatetimeIndexbefore the upgrade. -
DataArray.to_dataframe()also behaves differently comparing toDataArray.to_pandas().
The below example reproduces both issues. Shouldn't both method be returning DatetimeIndex-ed data frames? My current hack is to manually convert the MultiIndex back to DatetimeIndex.
I suspect this is something related to the new Pandas but don't see any obvious links. Would someone be able to shed some light on this?
EDIT: I suspect if this is related to this change: MultiIndex Constructor with a Single Level?
Much appreciated.
import xarray as xr
import numpy as np
import pandas as pd
%load_ext watermark
%watermark -dv -iv
# Output
xarray 0.9.6
numpy 1.13.1
pandas 0.21.0
2017-10-30
CPython 3.6.0
IPython 6.1.0
ind = pd.date_range('2017-01-01', periods = 10)
data = np.random.rand(10, 2)
da = xr.DataArray(data, coords=[ind, ['a', 'b']], dims=['time', 'name'])
da
# Output
<xarray.DataArray (time: 10, name: 2)>
array([[ 0.898389, 0.450587],
[ 0.514437, 0.444302],
[ 0.005995, 0.670285],
[ 0.50663 , 0.292316],
[ 0.120645, 0.585734],
[ 0.651648, 0.248069],
[ 0.11054 , 0.537342],
[ 0.265794, 0.123329],
[ 0.282711, 0.366271],
[ 0.420693, 0.717985]])
Coordinates:
* time (time) datetime64[ns] 2017-01-01 2017-01-02 2017-01-03 ...
* name (name) <U1 'a' 'b'
# Note the MultiIndex returned here
ds['a'].to_dataframe().index
# Output
MultiIndex(levels=[[2017-01-01 00:00:00, 2017-01-02 00:00:00, 2017-01-03 00:00:00, 2017-01-04 00:00:00, 2017-01-05 00:00:00, 2017-01-06 00:00:00, 2017-01-07 00:00:00, 2017-01-08 00:00:00, 2017-01-09 00:00:00, 2017-01-10 00:00:00]],
labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
names=['time'])
# calling to_panads() returned `DatetimeIndex` as expected.
da.sel(name='a').to_pandas().index
# Output
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10'],
dtype='datetime64[ns]', name='time', freq='D')