Skip to content

ENH ohlc resample for DataFrame #4740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 10, 2013
Merged

ENH ohlc resample for DataFrame #4740

merged 2 commits into from
Sep 10, 2013

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Sep 3, 2013

closes #2320

Can use ohlc from DataFrame.

cc @jreback

Example:

In [24]: df = pd.DataFrame({'PRICE': {Timestamp('2011-01-06 10:59:05', tz=None): 24990,
  Timestamp('2011-01-06 12:43:33', tz=None): 25499,
  Timestamp('2011-01-06 12:54:09', tz=None): 25499},
 'VOLUME': {Timestamp('2011-01-06 10:59:05', tz=None): 1500000000,
  Timestamp('2011-01-06 12:43:33', tz=None): 5000000000,
  Timestamp('2011-01-06 12:54:09', tz=None): 100000000}})

In [25]: df.resample('H', how='ohlc')
Out[25]: 
                     PRICE                           VOLUME              \
                      open   high    low  close        open        high   
2011-01-06 10:00:00  24990  24990  24990  24990  1500000000  1500000000   
2011-01-06 11:00:00    NaN    NaN    NaN    NaN         NaN         NaN   
2011-01-06 12:00:00  25499  25499  25499  25499  5000000000  5000000000   


                            low       close  
2011-01-06 10:00:00  1500000000  1500000000  
2011-01-06 11:00:00         NaN         NaN  
2011-01-06 12:00:00   100000000   100000000 

@@ -430,6 +430,13 @@ def ohlc(self):

For multiple groupings, the result index will be a MultiIndex
"""
if isinstance(self.obj, com.ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mke this a separate method so that if in the future we define multiple aggregators like this can be easily used

here's another one.... df.groupby('A').describe() (not defined by pretty easy to do!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupby is a crazy place (not sure where this should go), but I see you're point, it ought to be refactored out of there... Are you suggesting just a method like this:

def _apply_to_column_groupbys(self, func):
            from pandas.tools.merge import concat
            return concat((func(col_groupby)
                                for _, col_groupby in self._iterate_column_groupbys()),
                            keys=self.obj.columns,
                            axis=1)

df.groupby('A').describe() works (?) but puts the descriptions in the index rather than in the columns:

In [29]: df.groupby('PRICE').describe()  # expected .unstack(1)
Out[29]: 
             PRICE        VOLUME
PRICE                           
24990 count      1  1.000000e+00
      mean   24990  1.500000e+09
      std      NaN           NaN
      min    24990  1.500000e+09
      25%    24990  1.500000e+09
      50%    24990  1.500000e+09
      75%    24990  1.500000e+09
      max    24990  1.500000e+09
25499 count      2  2.000000e+00
      mean   25499  2.550000e+09
      std        0  3.464823e+09
      min    25499  1.000000e+08
      25%    25499  1.325000e+09
      50%    25499  2.550000e+09
      75%    25499  3.775000e+09
      max    25499  5.000000e+09

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also create new ohlc method in DataFrameGroupby (I wasn't sure what was preferred)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm.....maybe i'll step thru this at some point....it is a bit confusing.....maybe something is off with ohlc.....I though describe would not work at all.....it might just need a parameter....becuase the behaviour IS to create a mi (e.g. it shouldn't need your patch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I don't think my patch touches it. I refactored pr, a little nicer now...

I think ohlc behaviour is correct, confused about describe (above behaviour is in 0.12 too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps override describe (like I have ohlc) to do:

g._apply_to_column_groupbys(lambda x: x.describe().unstack(-1))

seems hacky, it is... must be nicer way

@jreback
Copy link
Contributor

jreback commented Sep 4, 2013

no what puzzles me is why ohlc fails and describe almost works
(well ohlc is a cython function and describe is not) so there is a disconnect that allows one path to work (almost) and the other to fail

@hayd
Copy link
Contributor Author

hayd commented Sep 9, 2013

@jreback What did you think about this one? Not sure what we were looking into re describe (is that a separate issue*?)

* describe should have MultiIndex column, rather than index.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

can you put a test in for doing the same with describe and see what happens?

@hayd
Copy link
Contributor Author

hayd commented Sep 9, 2013

When I did this last time and also in master:

In [29]: df.groupby('PRICE').describe()  # expected .unstack(1)
Out[29]: 
             PRICE        VOLUME
PRICE                           
24990 count      1  1.000000e+00
      mean   24990  1.500000e+09
      std      NaN           NaN
      min    24990  1.500000e+09
      25%    24990  1.500000e+09
      50%    24990  1.500000e+09
      75%    24990  1.500000e+09
      max    24990  1.500000e+09
25499 count      2  2.000000e+00
      mean   25499  2.550000e+09
      std        0  3.464823e+09
      min    25499  1.000000e+08
      25%    25499  1.325000e+09
      50%    25499  2.550000e+09
      75%    25499  3.775000e+09
      max    25499  5.000000e+09

so, it appends it to index, rather than as a MultiIndex column,...

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

hmm...must be because the ohlc is a cythonized and the describe is not (so it a general groupby). I think what you show as the ohlc is correct, so then I guess that this a a bug (but different)

@hayd hayd mentioned this pull request Sep 9, 2013
@hayd
Copy link
Contributor Author

hayd commented Sep 9, 2013

so merge? opened issue re describe

@jreback
Copy link
Contributor

jreback commented Sep 10, 2013

ok

hayd added a commit that referenced this pull request Sep 10, 2013
ENH ohlc resample for DataFrame
@hayd hayd merged commit 859aada into pandas-dev:master Sep 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-column grouping (e.g. OHLC) for DataFrame
2 participants