-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Up-Resample data with PeriodIndex has unexpected behavior #42763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Upsampling with pandas/pandas/core/resample.py Lines 1288 to 1301 in 860ff03
The all pandas/pandas/core/resample.py Lines 1330 to 1336 in 860ff03
We can reproduce what happens in the previous code by doing the following: import pandas as pd
# Like example 2 above
idx = pd.period_range(start='1/10/2000', periods=2, freq='D')
series = pd.Series(range(2,4), index=idx)
offset, conv = "12H", "end"
resampler = series.resample(offset, convention=conv)
assert resampler.ax is series.index
memb = resampler.ax.asfreq(offset, conv)
memb.get_indexer(resampler.binner) The last line produces
for any offset corresponding to Hour, i.e. "H", "2H", "12H", etc. |
@dicristina # Downsampling
return self._groupby_and_aggregate(how, grouper=self.grouper, **kwargs) So, the behavior is consistent for down/up-sampling. Intuitively, the up-sampling operation results in some empty groups, and the aggregate functions applied to empty groups might return |
That is the way to go but there are some tests that must be dealt with. As a workaround you can use s.resample('Q', kind='period', convention='start').agg({"count": "count"})["count"] This gives the expected result because |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
Problem description
Try to up-resample a time series with
PeriodIndex
as the index of the series, and then count the number of records in each time-group. However, the output is not the "count", but it seems to be the only record's value in the group.In the second example with
convention='end'
, the result is allNaN
. If switches to 'convention='start'', then the result is similar to that of the Example-1. In addition, from the grouped details, we can see that the data has been correctly grouped, while the aggregate functions (e.g.,count
) behave unexpected.Expected Output
The number of samples for each group (call the
count()
method).As a comparison of Example-1, I make another example, which tries to up-sample a time series with
Timestamp
as index. It behaves as expected.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: