Skip to content

DataArray.to_dataframe() returns MultiIndex frames instead of DatetimeIndex in Pandas 0.21 #1671

@esvhd

Description

@esvhd

Hi all,

I encountered the following after upgrading to Pandas 0.21. Essentially 2 issues:

  1. The problem is when I have a Dataset with one of the coords as pd.DatatimeIndex, calling DataArray.to_dataframe() produced a pd.DataFrame with MultiIndex, rather than DatetimeIndex before the upgrade.

  2. DataArray.to_dataframe() also behaves differently comparing to DataArray.to_pandas().

The below example reproduces both issues. Shouldn't both method be returning DatetimeIndex-ed data frames? My current hack is to manually convert the MultiIndex back to DatetimeIndex.

I suspect this is something related to the new Pandas but don't see any obvious links. Would someone be able to shed some light on this?

EDIT: I suspect if this is related to this change: MultiIndex Constructor with a Single Level?

Much appreciated.

import xarray as xr
import numpy as np
import pandas as pd

%load_ext watermark
%watermark -dv -iv

# Output
xarray      0.9.6
numpy       1.13.1
pandas      0.21.0
2017-10-30 

CPython 3.6.0
IPython 6.1.0

ind = pd.date_range('2017-01-01', periods = 10)
data = np.random.rand(10, 2)
da = xr.DataArray(data, coords=[ind, ['a', 'b']], dims=['time', 'name'])
da

# Output
<xarray.DataArray (time: 10, name: 2)>
array([[ 0.898389,  0.450587],
       [ 0.514437,  0.444302],
       [ 0.005995,  0.670285],
       [ 0.50663 ,  0.292316],
       [ 0.120645,  0.585734],
       [ 0.651648,  0.248069],
       [ 0.11054 ,  0.537342],
       [ 0.265794,  0.123329],
       [ 0.282711,  0.366271],
       [ 0.420693,  0.717985]])
Coordinates:
  * time     (time) datetime64[ns] 2017-01-01 2017-01-02 2017-01-03 ...
  * name     (name) <U1 'a' 'b'

# Note the MultiIndex returned here
ds['a'].to_dataframe().index

# Output
MultiIndex(levels=[[2017-01-01 00:00:00, 2017-01-02 00:00:00, 2017-01-03 00:00:00, 2017-01-04 00:00:00, 2017-01-05 00:00:00, 2017-01-06 00:00:00, 2017-01-07 00:00:00, 2017-01-08 00:00:00, 2017-01-09 00:00:00, 2017-01-10 00:00:00]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=['time'])

# calling to_panads() returned `DatetimeIndex` as expected.
da.sel(name='a').to_pandas().index

# Output
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10'],
              dtype='datetime64[ns]', name='time', freq='D')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions