Skip to content

DataArray.to_dataframe() returns MultiIndex frames instead of DatetimeIndex in Pandas 0.21 #1671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
esvhd opened this issue Oct 30, 2017 · 3 comments

Comments

@esvhd
Copy link

esvhd commented Oct 30, 2017

Hi all,

I encountered the following after upgrading to Pandas 0.21. Essentially 2 issues:

  1. The problem is when I have a Dataset with one of the coords as pd.DatatimeIndex, calling DataArray.to_dataframe() produced a pd.DataFrame with MultiIndex, rather than DatetimeIndex before the upgrade.

  2. DataArray.to_dataframe() also behaves differently comparing to DataArray.to_pandas().

The below example reproduces both issues. Shouldn't both method be returning DatetimeIndex-ed data frames? My current hack is to manually convert the MultiIndex back to DatetimeIndex.

I suspect this is something related to the new Pandas but don't see any obvious links. Would someone be able to shed some light on this?

EDIT: I suspect if this is related to this change: MultiIndex Constructor with a Single Level?

Much appreciated.

import xarray as xr
import numpy as np
import pandas as pd

%load_ext watermark
%watermark -dv -iv

# Output
xarray      0.9.6
numpy       1.13.1
pandas      0.21.0
2017-10-30 

CPython 3.6.0
IPython 6.1.0

ind = pd.date_range('2017-01-01', periods = 10)
data = np.random.rand(10, 2)
da = xr.DataArray(data, coords=[ind, ['a', 'b']], dims=['time', 'name'])
da

# Output
<xarray.DataArray (time: 10, name: 2)>
array([[ 0.898389,  0.450587],
       [ 0.514437,  0.444302],
       [ 0.005995,  0.670285],
       [ 0.50663 ,  0.292316],
       [ 0.120645,  0.585734],
       [ 0.651648,  0.248069],
       [ 0.11054 ,  0.537342],
       [ 0.265794,  0.123329],
       [ 0.282711,  0.366271],
       [ 0.420693,  0.717985]])
Coordinates:
  * time     (time) datetime64[ns] 2017-01-01 2017-01-02 2017-01-03 ...
  * name     (name) <U1 'a' 'b'

# Note the MultiIndex returned here
ds['a'].to_dataframe().index

# Output
MultiIndex(levels=[[2017-01-01 00:00:00, 2017-01-02 00:00:00, 2017-01-03 00:00:00, 2017-01-04 00:00:00, 2017-01-05 00:00:00, 2017-01-06 00:00:00, 2017-01-07 00:00:00, 2017-01-08 00:00:00, 2017-01-09 00:00:00, 2017-01-10 00:00:00]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=['time'])

# calling to_panads() returned `DatetimeIndex` as expected.
da.sel(name='a').to_pandas().index

# Output
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10'],
              dtype='datetime64[ns]', name='time', freq='D')
@fmaussion
Copy link
Member

There was a recent PR about pandas compatibility (#1669), but I'm not sure if it solves your issue. Could you try your code on latest master?

(pip install --upgrade git+https://github.com/pydata/xarray.git)

@shoyer
Copy link
Member

shoyer commented Oct 30, 2017

Thanks for the report. This was fixed by #1548 and will be in the next release (release candidate should be out later today).

@jhamman jhamman closed this as completed Nov 1, 2017
@esvhd
Copy link
Author

esvhd commented Nov 1, 2017

Thank you guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants