ENH ohlc resample for DataFrame #4740

hayd · 2013-09-03T23:35:28Z

Can use ohlc from DataFrame.

Example:

In [24]: df = pd.DataFrame({'PRICE': {Timestamp('2011-01-06 10:59:05', tz=None): 24990,
  Timestamp('2011-01-06 12:43:33', tz=None): 25499,
  Timestamp('2011-01-06 12:54:09', tz=None): 25499},
 'VOLUME': {Timestamp('2011-01-06 10:59:05', tz=None): 1500000000,
  Timestamp('2011-01-06 12:43:33', tz=None): 5000000000,
  Timestamp('2011-01-06 12:54:09', tz=None): 100000000}})

In [25]: df.resample('H', how='ohlc')
Out[25]: 
                     PRICE                           VOLUME              \
                      open   high    low  close        open        high   
2011-01-06 10:00:00  24990  24990  24990  24990  1500000000  1500000000   
2011-01-06 11:00:00    NaN    NaN    NaN    NaN         NaN         NaN   
2011-01-06 12:00:00  25499  25499  25499  25499  5000000000  5000000000   


                            low       close  
2011-01-06 10:00:00  1500000000  1500000000  
2011-01-06 11:00:00         NaN         NaN  
2011-01-06 12:00:00   100000000   100000000

jreback · 2013-09-03T23:41:45Z

pandas/core/groupby.py

@@ -430,6 +430,13 @@ def ohlc(self):

        For multiple groupings, the result index will be a MultiIndex
        """
+        if isinstance(self.obj, com.ABCDataFrame):


I would mke this a separate method so that if in the future we define multiple aggregators like this can be easily used

here's another one.... df.groupby('A').describe() (not defined by pretty easy to do!)

groupby is a crazy place (not sure where this should go), but I see you're point, it ought to be refactored out of there... Are you suggesting just a method like this:

def _apply_to_column_groupbys(self, func): from pandas.tools.merge import concat return concat((func(col_groupby) for _, col_groupby in self._iterate_column_groupbys()), keys=self.obj.columns, axis=1)

df.groupby('A').describe() works (?) but puts the descriptions in the index rather than in the columns:

In [29]: df.groupby('PRICE').describe() # expected .unstack(1) Out[29]: PRICE VOLUME PRICE 24990 count 1 1.000000e+00 mean 24990 1.500000e+09 std NaN NaN min 24990 1.500000e+09 25% 24990 1.500000e+09 50% 24990 1.500000e+09 75% 24990 1.500000e+09 max 24990 1.500000e+09 25499 count 2 2.000000e+00 mean 25499 2.550000e+09 std 0 3.464823e+09 min 25499 1.000000e+08 25% 25499 1.325000e+09 50% 25499 2.550000e+09 75% 25499 3.775000e+09 max 25499 5.000000e+09

could also create new ohlc method in DataFrameGroupby (I wasn't sure what was preferred)

hmmm.....maybe i'll step thru this at some point....it is a bit confusing.....maybe something is off with ohlc.....I though describe would not work at all.....it might just need a parameter....becuase the behaviour IS to create a mi (e.g. it shouldn't need your patch)

@jreback I don't think my patch touches it. I refactored pr, a little nicer now...

I think ohlc behaviour is correct, confused about describe (above behaviour is in 0.12 too)

perhaps override describe (like I have ohlc) to do:

g._apply_to_column_groupbys(lambda x: x.describe().unstack(-1))

seems hacky, it is... must be nicer way

jreback · 2013-09-04T00:35:11Z

no what puzzles me is why ohlc fails and describe almost works
(well ohlc is a cython function and describe is not) so there is a disconnect that allows one path to work (almost) and the other to fail

hayd · 2013-09-09T17:37:15Z

@jreback What did you think about this one? Not sure what we were looking into re describe (is that a separate issue*?)

* describe should have MultiIndex column, rather than index.

jreback · 2013-09-09T17:54:07Z

can you put a test in for doing the same with describe and see what happens?

hayd · 2013-09-09T19:25:03Z

When I did this last time and also in master:

In [29]: df.groupby('PRICE').describe()  # expected .unstack(1)
Out[29]: 
             PRICE        VOLUME
PRICE                           
24990 count      1  1.000000e+00
      mean   24990  1.500000e+09
      std      NaN           NaN
      min    24990  1.500000e+09
      25%    24990  1.500000e+09
      50%    24990  1.500000e+09
      75%    24990  1.500000e+09
      max    24990  1.500000e+09
25499 count      2  2.000000e+00
      mean   25499  2.550000e+09
      std        0  3.464823e+09
      min    25499  1.000000e+08
      25%    25499  1.325000e+09
      50%    25499  2.550000e+09
      75%    25499  3.775000e+09
      max    25499  5.000000e+09

so, it appends it to index, rather than as a MultiIndex column,...

jreback · 2013-09-09T19:29:22Z

hmm...must be because the ohlc is a cythonized and the describe is not (so it a general groupby). I think what you show as the ohlc is correct, so then I guess that this a a bug (but different)

hayd · 2013-09-09T23:29:19Z

so merge? opened issue re describe

jreback · 2013-09-10T00:47:38Z

ok

ENH ohlc resample for DataFrame

ENH ohlc resample for DataFrame

74e4544

jreback reviewed Sep 3, 2013
View reviewed changes

CLN refactor with _apply_to_column_groupbys

e85ef7d

hayd mentioned this pull request Sep 9, 2013

describe on a groupby #4792

Closed

hayd added a commit that referenced this pull request Sep 10, 2013

Merge pull request #4740 from hayd/ohlc_dataframe

859aada

ENH ohlc resample for DataFrame

hayd merged commit 859aada into pandas-dev:master Sep 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH ohlc resample for DataFrame #4740

ENH ohlc resample for DataFrame #4740

hayd commented Sep 3, 2013

jreback Sep 3, 2013

hayd Sep 3, 2013

hayd Sep 4, 2013

jreback Sep 4, 2013

hayd Sep 4, 2013

hayd Sep 4, 2013

jreback commented Sep 4, 2013

hayd commented Sep 9, 2013

jreback commented Sep 9, 2013

hayd commented Sep 9, 2013

jreback commented Sep 9, 2013

hayd commented Sep 9, 2013

jreback commented Sep 10, 2013

ENH ohlc resample for DataFrame #4740

ENH ohlc resample for DataFrame #4740

Conversation

hayd commented Sep 3, 2013

jreback Sep 3, 2013

Choose a reason for hiding this comment

hayd Sep 3, 2013

Choose a reason for hiding this comment

hayd Sep 4, 2013

Choose a reason for hiding this comment

jreback Sep 4, 2013

Choose a reason for hiding this comment

hayd Sep 4, 2013

Choose a reason for hiding this comment

hayd Sep 4, 2013

Choose a reason for hiding this comment

jreback commented Sep 4, 2013

hayd commented Sep 9, 2013

jreback commented Sep 9, 2013

hayd commented Sep 9, 2013

jreback commented Sep 9, 2013

hayd commented Sep 9, 2013

jreback commented Sep 10, 2013