groupby with multiindex behaves differently for series and single-column dataframe #9703

mcwitt · 2015-03-23T00:51:44Z

This behavior seems strange to me. Starting with a multiindex dataframe with unbalanced levels:

In [3]: pd.__version__
Out[3]: '0.15.2'

In [4]: df = pd.read_csv(StringIO('''
A,B,x
three,a,1
three,b,2
three,c,3
two,a,4
two,b,5
one,a,6'''), index_col=list('AB'))

In [5]: df
Out[5]: 
         x
A     B   
three a  1
      b  2
      c  3
two   a  4
      b  5
one   a  6

Groupby aggregations on this dataframe seem to revert the index to the cross product of the levels, potentially leaving many NAs in the result:

In [6]: df.groupby(level=[0,1]).mean()
Out[6]: 
          x
A     B    
one   a   6
      b NaN
      c NaN
three a   1
      b   2
      c   3
two   a   4
      b   5
      c NaN

But with the series this doesn't occur (no NAs in the result):

In [7]: df.x.groupby(level=[0,1]).mean()
Out[7]: 
A      B
one    a    6
three  a    1
       b    2
       c    3
two    a    4
       b    5
Name: x, dtype: int64

I'm wondering if this is a bug or intended behavior? I haven't been able to find any mention of different behavior in the docs.

mcwitt · 2015-03-23T01:21:58Z

Related #3835, but looks like the cartesian product index was ruled out as an option there.

jreback · 2015-03-23T01:26:31Z

this was fixed by #9177, which is in 0.16.0 (released today).

In [3]: df.groupby(level=[0,1]).mean()
Out[3]: 
         x
A     B   
one   a  6
three a  1
      b  2
      c  3
two   a  4
      b  5

In [4]: df.x.groupby(level=[0,1]).mean()
Out[4]: 
A      B
one    a    6
three  a    1
       b    2
       c    3
two    a    4
       b    5
Name: x, dtype: int64

jreback closed this as completed Mar 23, 2015

jreback added Bug Groupby labels Mar 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby with multiindex behaves differently for series and single-column dataframe #9703

groupby with multiindex behaves differently for series and single-column dataframe #9703

mcwitt commented Mar 23, 2015

mcwitt commented Mar 23, 2015

jreback commented Mar 23, 2015

groupby with multiindex behaves differently for series and single-column dataframe #9703

groupby with multiindex behaves differently for series and single-column dataframe #9703

Comments

mcwitt commented Mar 23, 2015

mcwitt commented Mar 23, 2015

jreback commented Mar 23, 2015