Skip to content

Unexpected result with Resampler.apply for non-naive time index #25411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kdebrab opened this issue Feb 22, 2019 · 7 comments · Fixed by #52956
Closed

Unexpected result with Resampler.apply for non-naive time index #25411

kdebrab opened this issue Feb 22, 2019 · 7 comments · Fixed by #52956
Labels
Apply Apply, Aggregate, Transform, Map Bug Datetime Datetime data dtype Needs Tests Unit test(s) needed to prevent regressions Resample resample method

Comments

@kdebrab
Copy link
Contributor

kdebrab commented Feb 22, 2019

When resampling a non-naive time series with a custom function (using apply or agg), an unexpected result is returned.

import pandas as pd

def weighted_quantile(series, weights, q):
    series = series.sort_values()
    cumsum = weights.reindex(series.index).fillna(0).cumsum()
    cutoff = cumsum.iloc[-1] * q
    return series[cumsum >= cutoff].iloc[0]

times = pd.date_range('2017-6-23 18:00', periods=8, freq='15T', tz='UTC')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times)

data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[2]: 
2017-06-23 00:00:00+00:00    0.0
Freq: D, dtype: float64

Expected Output

The (single) value of the series should correspond with:

weighted_quantile(data, weights=weights, q=0.5)
Out[3]: 1.0

One indeed gets this result when passing data and weigths with naive time index:

times_naive = pd.date_range('2017-6-23 18:00', periods=8, freq='15T')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times_naive)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times_naive)

data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[4]: 
2017-06-23    1.0
Freq: D, dtype: float64

But, as shown above, not when passing non-naive time series.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 4.2.1
pip: 19.0.2
setuptools: 39.0.1
Cython: 0.29.4
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: 0.11.3
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.3.0
bs4: 4.6.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 22, 2019

there was quite a bit of work on this in 0.24
pls try and/or with master

@kdebrab
Copy link
Contributor Author

kdebrab commented Feb 22, 2019

I get the same result with pandas 0.24.1

@kdebrab
Copy link
Contributor Author

kdebrab commented Feb 22, 2019

It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.

@gfyoung gfyoung added Datetime Datetime data dtype Resample resample method labels Feb 23, 2019
@jbrockmendel jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Oct 23, 2019
@kdebrab
Copy link
Contributor Author

kdebrab commented Feb 25, 2020

The output in pandas 1.0.1 is:

Out[2]: 
2017-06-23 00:00:00+00:00    1.0
Freq: D, dtype: float64

So, this issue has been resolved by now!

@mroeschke mroeschke added the Bug label Apr 1, 2020
@wfvining
Copy link

wfvining commented Jun 8, 2020

It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.

I am having this same problem in 1.0.1. If I do the following

def show_tz(xs):
    print(xs.index.tz)
    return False


ser = pd.Series(
    1, 
    index=pd.date_range(
        start='2020-01-01', 
        end='2020-01-05', 
        freq='H',
        tz='Etc/GMT-7'
    )
)
ser.resample('D').apply(show_tz)

I expect to see the following output

Etc/GMT-7
Etc/GMT-7
Etc/GMT-7
Etc/GMT-7

But instead I get

Etc/GMT-7
None
None
None

If you examine the value of xs for each application you find that the one 'group' that is not tz-naive has no data in it.

@kdebrab
Copy link
Contributor Author

kdebrab commented Dec 14, 2021

As of pandas version 1.3.5, this issue is no longer present. Seems to be solved since version 1.3.3. With version 1.3.2 I get a IndexError when executing above example.

@jreback
Copy link
Contributor

jreback commented Dec 14, 2021

would certainly take a validation test for this

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Datetime Datetime data dtype Needs Tests Unit test(s) needed to prevent regressions Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants