Skip to content

BUG: Frequency-based shifting doesn't work with groupby and MultiIndex #11452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
naught101 opened this issue Oct 28, 2015 · 8 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request

Comments

@naught101
Copy link

If you have a dataframe with a multiindex, and use groupby and then shift, using the frequency argument, the shifting fails, resulting in an empty result. For example:

import numpy as np
import pandas as pd

dates1 = pd.date_range('2015-10-01', '2015-10-03')
df1 = pd.DataFrame(dict(grp='A', blah=[1,2,3]), index=dates1) .set_index('grp', append=True)

dates2 = pd.date_range('2015-10-02', '2015-10-04')
df2 = pd.DataFrame(dict(grp='B', blah=[10,np.nan,30]), index=dates2) .set_index('grp', append=True)

df = pd.concat([df1, df2])

In [ ]: df
Out[ ]: 
                blah
           grp      
2015-10-01 A       1
2015-10-02 A       2
2015-10-03 A       3
2015-10-02 B      10
2015-10-03 B     NaN
2015-10-04 B      30

# This works:
In [ ]: df.groupby(level='grp').shift(1)
Out[ ]: 
                blah
           grp      
2015-10-01 A     NaN
2015-10-02 A       1
2015-10-03 A       2
2015-10-02 B     NaN
2015-10-03 B      10
2015-10-04 B     NaN

# This fails:
In [ ]: df.groupby(level='grp').shift(1, freq='D')
Out[ ]: 
Empty DataFrame
Columns: []
Index: []

I'm not sure why, however using 'grp' as a column, it works fine:

In [ ]: df.reset_index('grp').groupby('grp').shift(1, freq='D')
Out[ ]: 
                blah
grp                 
A   2015-10-02     1
    2015-10-03     2
    2015-10-04     3
B   2015-10-03    10
    2015-10-04   NaN
    2015-10-05    30
@sinhrks
Copy link
Member

sinhrks commented Oct 29, 2015

Thanks for the report. Internally shift is applied to original DataFrame per group which has MultiIndex, and results in error when freq is given.

I think GroupBy should do the special handling to exclude group key belongs to Index.

@naught101
Copy link
Author

@jreback #11211 makes it so that this raises an error, which is probably better than the behaviour reported, but I think frequency shifting on a multi-index where one of the indices is a datetime index would be pretty nice to have, especially in the case above, where the other index is just being used to groupby. Could .shift maybe have a level parameter?

@melroy89
Copy link

melroy89 commented Jan 18, 2017

@jreback @naught101
Bug is still present in 2017. Pandas v0.19.2.

I got the STILL the same problem.. Giving NaN at some places.. Date is the index (descending).

            c         r1         r2
Date 
2017-01-17  107.75    -0.000464  -0.000464
2017-01-16  107.80    -0.006957        NaN
2017-01-13  108.55    0.018425    0.018771
2017-01-12  106.55    -0.004223  -0.004206
2017-01-11  107.00    0.002804    0.002812
2017-01-10  106.70    0.003280    0.003291
2017-01-09  106.35    0.008933         NaN
2017-01-06  105.40    0.000000    0.000000
2017-01-05  105.40    0.002372    0.002378
2017-01-04  105.15    -0.013790  -0.013602
2017-01-03  106.60    -0.004221  -0.004204
2017-01-02  107.05    0.003737         NaN

Code:

df_1 = df.shift(1, freq='B')

df['r1'] = (df['c']-df_1['c'])/df['c']
df['r2'] = df['c'].pct_change(freq='D')

r1 is actually doing the same as pct_change() should do, but it doesn't. r2 is giving NaN at some places.

Also the numbers LOOK the same, but there is a diff in them like 0.018425 != 0.018771. But I know for sure my calculation is correct RETURN = (END_PERIOD_VALUE-BEGIN_PERIOD_VALUE)/END_PERIOD_VALUE. Or do I really use the wrong pandas function..? 😕

Please fix this issue.

@jreback jreback added this to the Next Major Release milestone Jan 18, 2017
@jreback
Copy link
Contributor

jreback commented Jan 18, 2017

@Danger89 well, see that open tag. that means it wasn't closed in a prior release. The quickest way to have something fixed is to submit a pull-request :)

what you are showing as an example is not reflective of this issue in any event (and I would guess you dtypes are object as they way they format).

@jreback jreback added API Design Datetime Datetime data dtype labels Jan 18, 2017
@melroy89
Copy link

melroy89 commented Jan 18, 2017

@jreback The dtype of 'c' is float64. Is that answering your remark? And it says 'dtype: object'. I'm using df.dtypes.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2017

@Danger89 you have not posted a reproducible example. I suggest that you open a new issue if you think there is a bug here, otherwise SO may be able to help.

@melroy89
Copy link

Ow.. oke.

@mroeschke
Copy link
Member

This looks to be a duplicate of #10128

@mroeschke mroeschke added Duplicate Report Duplicate issue or pull request and removed API Design Groupby MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

6 participants