Add time-length windowing capability to moving statistics #936

wesm · 2012-03-18T21:38:45Z

No description provided.

BrenBarn · 2013-01-13T05:45:20Z

If I understand this right, it's similar to what I'm asking here: http://stackoverflow.com/questions/14300768/pandas-rolling-computation-with-window-based-on-values-instead-of-counts . I think the facility should not be time-specific. You should be able to use windows of any value range on any sort of values, not just time values.

invisibleroads · 2013-01-14T05:37:11Z

Thanks for sharing!

ghost · 2013-03-22T07:30:16Z

@jreback - cookbook.

jreback · 2013-09-21T17:06:46Z

in cookbook...closing

SmartLayer · 2014-01-15T03:40:17Z

When rolling_mean was found in stats.moments, it is naturally assumed time is weighted, otherwise it should appear in stats instead. However I found it is not weighted.

>>> ts = Series(randn(15), index=date_range('1/1/2000', periods=15))
>>> ts
2000-01-01   -0.195255
2000-01-02    0.920142
2000-01-03    1.498506
2000-01-04   -0.923250
2000-01-05   -0.775110
2000-01-06    1.533274
2000-01-07    1.455366
2000-01-08   -1.738300
2000-01-09    0.102575
2000-01-10   -1.767898
2000-01-11    1.890013
2000-01-12   -1.106158
2000-01-13    0.457826
2000-01-14   -0.951881
2000-01-15   -1.738844
Freq: D, dtype: float64
>>> ts2 = Series(ts, index=date_range('1/1/2000', periods=10)+date_range('20/1/2000', periods=5))
>>> ts2
2000-01-01   -0.195255
2000-01-02    0.920142
2000-01-03    1.498506
2000-01-04   -0.923250
2000-01-05   -0.775110
2000-01-06    1.533274
2000-01-07    1.455366
2000-01-08   -1.738300
2000-01-09    0.102575
2000-01-10   -1.767898
2000-01-20         NaN
2000-01-21         NaN
2000-01-22         NaN
2000-01-23         NaN
2000-01-24         NaN
dtype: float64
>>> ts2[-5:] = [1.890013,-1.106158,0.457826,-0.951881,-1.738844]
>>> # now that ts1 and ts2 are idential in value but different in index
>>> ts2
2000-01-01   -0.195255
2000-01-02    0.920142
2000-01-03    1.498506
2000-01-04   -0.923250
2000-01-05   -0.775110
2000-01-06    1.533274
2000-01-07    1.455366
2000-01-08   -1.738300
2000-01-09    0.102575
2000-01-10   -1.767898
2000-01-20    1.890013
2000-01-21   -1.106158
2000-01-22    0.457826
2000-01-23   -0.951881
2000-01-24   -1.738844
dtype: float64
>>> TS = ts.cumsum()
>>> TS2 = ts2.cumsum()
>>> # you will about to find rolling_mean(TS) and rolling_mean(TS2) produce same result!
>>> rolling_mean(TS, 1)
2000-01-01   -0.195255
2000-01-02    0.724887
2000-01-03    2.223392
2000-01-04    1.300143
2000-01-05    0.525033
2000-01-06    2.058307
2000-01-07    3.513673
2000-01-08    1.775373
2000-01-09    1.877948
2000-01-10    0.110050
2000-01-11    2.000063
2000-01-12    0.893905
2000-01-13    1.351731
2000-01-14    0.399849
2000-01-15   -1.338994
Freq: D, dtype: float64
>>> rolling_mean(TS2, 1)
2000-01-01   -0.195255
2000-01-02    0.724887
2000-01-03    2.223392
2000-01-04    1.300143
2000-01-05    0.525033
2000-01-06    2.058307
2000-01-07    3.513673
2000-01-08    1.775373
2000-01-09    1.877948
2000-01-10    0.110050
2000-01-20    2.000063
2000-01-21    0.893905
2000-01-22    1.351731
2000-01-23    0.399850
2000-01-24   -1.338994
dtype: float64

The above experiment demonstrated that rolling_mean disregard the 10 missing days in TS2, and calculate as if the data is evenly sampled, that is, as if it is not Time Series. I am afraid users are not expecting this behaviour.

I believe this problem is integral to the feature asked here. If time is properly weighted in the calculation, there is no reason why window canot be specified with a time-frame. Solving the asked feature also solves this unwanted behaviour.

SmartLayer · 2014-01-15T03:46:04Z

in cookbook...closing --> IN which chapter of the cookbook? Looked, not found. (it is not searchable and Google search with rolling_mean as keyword only yeild results outside of the cookbook: if what you meant of cookbook is this one: http://pandas.pydata.org/pandas-docs/stable/cookbook.html

jreback · 2014-01-15T03:53:34Z

http://pandas.pydata.org/pandas-docs/stable/cookbook.html#expanding-data

SmartLayer · 2014-01-15T04:00:00Z

Oh I see, what is added to the cookbook is a link from cookbook to stackoverflow, that's why "Google search with rolling_mean as keyword only yeild results outside of the cookbook"

jreback · 2014-01-15T04:01:49Z

you can search from the docs, the API box

jreback · 2014-01-15T04:02:46Z

the cookbook is really just a collection of interesting links

ghost · 2014-01-15T11:39:42Z

@jreback, the cookbook entry is related but does it truely close this issue? can't say.

jorisvandenbossche · 2014-01-15T13:21:13Z

I also think the cookbook entry is not the real solution to this issue, although you can in principle solve this issue with it (but not that trivially for users I think).
What I understand from the original issue, is that you could do something like this:

pd.rolling_mean(ts, window='30min')

When you have eg regular timeseries of 5 min frequency, this would be the same as pd.rolling_mean(ts, 6) but 1) more convenient and 2) also applicable for irregular time series.
I think this would be a very valuable addition.

@zhangweiwu for the example you give, you can also use the freq keyword (will resampla the data to the give frequency) or manually resample the data first.

ajcremona · 2016-01-27T23:39:15Z

Is there any update on this? Very interested in this - very useful for irregular time series that are large data sets

vik748 · 2022-07-14T18:17:54Z

Folks I would like to help add a similar feature for dataframes with a scalar index or column. Looks like all current windows are based on the number of samples around the point of interest. Any tips / thoughts on where I should start and what the feature should look like?

Example, if we have a dataframe with a column "total distance travelled" and another "total fuel used". The rows can be irregular. I'd like to answer max and fuel consumed per 1km travelled. You can get the gist.

jreback closed this as completed Sep 21, 2013

ghost mentioned this issue Jan 15, 2014

API for splitting pandas objects #4059

Closed

jreback reopened this Jan 15, 2014

jreback mentioned this issue Feb 14, 2014

Better support for time-weighted averages #1377

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Feb 14, 2014

cpcloud mentioned this issue Jun 7, 2014

Split/Partition Master Issue #7387

Closed

8 tasks

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

max-sixty mentioned this issue Jun 26, 2016

ENH: add time-window capability to .rolling #13513

Closed

jreback modified the milestones: 0.18.2, Next Major Release Jun 26, 2016

jreback closed this as completed in 786edc7 Jul 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add time-length windowing capability to moving statistics #936

Add time-length windowing capability to moving statistics #936

wesm commented Mar 18, 2012

BrenBarn commented Jan 13, 2013

invisibleroads commented Jan 14, 2013

ghost commented Mar 22, 2013

jreback commented Sep 21, 2013

SmartLayer commented Jan 15, 2014

SmartLayer commented Jan 15, 2014

jreback commented Jan 15, 2014

SmartLayer commented Jan 15, 2014

jreback commented Jan 15, 2014

jreback commented Jan 15, 2014

ghost commented Jan 15, 2014

jorisvandenbossche commented Jan 15, 2014

ajcremona commented Jan 27, 2016

vik748 commented Jul 14, 2022

Add time-length windowing capability to moving statistics #936

Add time-length windowing capability to moving statistics #936

Comments

wesm commented Mar 18, 2012

BrenBarn commented Jan 13, 2013

invisibleroads commented Jan 14, 2013

ghost commented Mar 22, 2013

jreback commented Sep 21, 2013

SmartLayer commented Jan 15, 2014

SmartLayer commented Jan 15, 2014

jreback commented Jan 15, 2014

SmartLayer commented Jan 15, 2014

jreback commented Jan 15, 2014

jreback commented Jan 15, 2014

ghost commented Jan 15, 2014

jorisvandenbossche commented Jan 15, 2014

ajcremona commented Jan 27, 2016

vik748 commented Jul 14, 2022