Adds cummulative operators to API #812

pwolfram · 2016-03-31T14:37:50Z

This PR will add cumsum and cumprod as discussed in #791 as well ensuring cumprod works for the API, resolving issues discussed at #807.

TO DO (dependencies)

Add nancumprod and nancumsum to numpy (ENH: adds np.nancumsum and np.nancumprod numpy/numpy#7421)
Add nancumprod and nancumsum to dask (Adds nancumsum and nancumprod dask/dask#1077)

This PR extends infrastructure to support cumsum and cumprod (#791).

References:

ENH: adds np.nancumsum and np.nancumprod numpy/numpy#7421

cc @shoyer, @jhamman

pwolfram · 2016-03-31T14:44:51Z

@mrocklin and @shoyer, will we need to modify the definitions for nanprod, nancumsum, and nancumprod in dask for this to work in multi-threaded mode? I took a quick look and it appears they are defined https://github.com/dask/dask/blob/d82cf2ac3fa3a61912b7934afe7b2fe9e14cc4ff/dask/array/__init__.py#L17-L22 so I'm assuming xarray/dask should just work once the issues on the xarray end are resolved but just wanted to double check to make sure this is the case.

shoyer · 2016-03-31T23:41:59Z

@pwolfram dask will also definitely need nancumsum/nancumprod functions. These should be quite straightforward to add though given the existing infrastructure.

pwolfram · 2016-04-01T22:07:59Z

@shoyer see dask/dask#1077 for the nancumsum and nancumprod dask PR

pwolfram · 2016-04-07T19:59:35Z

@shoyer, it looks like I'll need to have a _reduce_method-like method that doesn't reduce the dimensions, e.g., https://github.com/pydata/xarray/blob/master/xarray/core/common.py#L11. I haven't been able to get the current branch to work properly (it returns a numpy array vs xarray datatype) and seem to be something missing. Am I on the right track that I need to add a new abstract method? This seems overly complicated but I haven't been able to get this to work cleanly otherwise, e.g., trying things like

    f = _func_slash_method_wrapper(method, name)
    setattr(cls, name, cls._binary_op(f))

Some advice or help to get me out of my naivete would be greatly appreciated. Thanks!

jhamman · 2016-05-11T04:15:18Z

@pwolfram - how's this going? Are we still stuck on the above question?

pwolfram · 2016-05-11T12:37:05Z

@jhamman, I need to get back to this when I can make the time but will let you know if I have more trouble. Thanks for following up.

pwolfram · 2016-09-20T19:13:02Z

@shoyer and @jhamman here is the general use of the cumsum and cumprod operators. Note, I probably need to have some type of error checking for the dask version (e.g., we require a version of dask with nancumsum, nanprod, and nancumprod). What is the standard way to do this?

For example, dask 0.11.0 works but dask 0.8.1 does not and returns an error.

pwolfram · 2016-09-20T22:51:14Z

@shoyer, @jhamman, and @MaximilianR this should be ready for a preliminary review because it works. The key thing missing is a check for dask version and potentially more testing. Thoughts on these issues are greatly appreciated.

shoyer · 2016-09-21T03:34:24Z

xarray/core/variable.py

@@ -893,16 +893,27 @@ def reduce(self, func, dim=None, axis=None, keep_attrs=False,
        if dim is not None and axis is not None:
            raise ValueError("cannot supply both 'axis' and 'dim' arguments")

+        if 'cum' in func.__name__:


Can we put these error checks in ops.cumsum instead? It's poor separation of concerns to do these checks over here.

I don't think so unless there is a clever way to do this that I'm missing. It looks like I'd need to have something like a new _partial_reduce_method that is for cumsum, cumprod, prod, etc. However, this will require additional code lines beyond the simplistic approach I've taken. But this may be necessary to get this to work in general in a clean way. For instance, I don't think the existing code works properly with dataset yet.

@shoyer, this obviously changes the comments below because we need this to work for both dataset and dataarray and I need to verify the existing implementation does just that.

@shoyer, I double-checked. This appears to work for both dataarray and dataset but it would be good to have some tests to ensure this functionality works in the future.

I've added some tests and will push them soon.

I didn't individual test dataarray and dataset functionality but assume it is inherited by variable. Hence, I've tested variable methods for cumsum and cumprod for now.

I still don't like this approach. It feels very fragile.

A slightly saner approach would another function attribute like numeric_only that we use in ops.py. Then this check could be: keep_dims = getattr(func, 'keep_dims', False).

Removed in favor of use of keep_dims approach

shoyer · 2016-09-21T03:36:52Z

xarray/core/variable.py

                if n not in removed_axes]

+        if 'cum' in func.__name__:


Same as above -- let's keep Variable.reduce unmodified for now. I think we'll want to actually implement this using xarray.apply, anyways.

Actually, I guess it's OK to keep this -- it is all we need to actually implement cumsum, after all, but let's do the shape check unilaterally, without looking at the function name.

@shoyer, I had originally implemented it that way, but this introduced a bug- related to use of prod if I recall correctly.

Thankfully, it appears to work without issue on re-examination.

Modified as you discussed below

shoyer · 2016-09-21T03:38:20Z

xarray/core/ops.py

    return _prod(values, axis=axis, **kwargs)
 prod.numeric_only = True

+def cumsum(values, axis=None, skipna=None, **kwargs):


Can you make a wrapper function that generates these functions, instead of repeating the logic three times for prod/cumsum/cumprod?

Better than that-- I extended functionality to _create_nan_agg_method to generalize, which removes quite a few lines of code. See d1c1077.

shoyer · 2016-09-21T16:29:25Z

xarray/core/ops.py

@@ -274,7 +274,8 @@ def _ignore_warnings_if(condition):
        yield


-def _create_nan_agg_method(name, numeric_only=False, coerce_strings=False):
+def _create_nan_agg_method(name, numeric_only=False, np_compat=False, 
+        no_bottleneck=False, coerce_strings=False):


keep indentation aligned with ( per PEP8

shoyer · 2016-09-21T16:30:00Z

xarray/core/ops.py

            else:
                eager_module = bn
            func = _dask_or_eager_func(nanname, eager_module)
-            using_numpy_nan_func = eager_module is np
+            using_numpy_nan_func = (eager_module is np) or (eager_module is npcompat)


just a nit: is has higher precedence than or, so you don't need parentheses

Thanks @shoyer!

shoyer · 2016-09-21T16:32:43Z

xarray/core/variable.py

                if n not in removed_axes]

+        if 'cum' in func.__name__:


Actually, I guess it's OK to keep this -- it is all we need to actually implement cumsum, after all, but let's do the shape check unilaterally, without looking at the function name.

shoyer · 2016-09-21T16:33:06Z

xarray/core/variable.py

                if n not in removed_axes]

+        if 'cum' in func.__name__:
+            def safe_shape(val):
+                return val.shape if type(val) is np.ndarray else ()


this should probably be just getattr(val, 'shape', ()) (dask arrays have shape defined, too)

Thanks @shoyer!

pwolfram · 2016-09-21T20:50:36Z

@shoyer, this should be ready for another review. I also tested it with this somewhat hacky code at https://gist.github.com/a329d441fe99ae342a34b1a374650138. It may be good to get some type of test like this into the test suite. However, the correct location for testing these methods, in general, is not transparent to me. It doesn't look like we broadly check reduction operations with nans, e.g., prod outside the test_variable.py file. I have made additions here but broader testing may be useful.

shoyer · 2016-09-21T20:49:18Z

xarray/core/variable.py

+        def safe_shape(val):
+            return getattr(val, 'shape', ())
+
+        if safe_shape(data) == safe_shape(self.data):


This should be done alternatively to calculating dims a few lines above, e.g.,

if getattr(data, 'shape', ()) == self.shape: dims = self.dims else: removed_axes = ... dims = [adim for n, admin in enumerate(self.dims) ...]

Otherwise we calculate those dimensions just to throw them away

shoyer · 2016-09-21T20:58:37Z

xarray/core/variable.py

@@ -893,16 +893,27 @@ def reduce(self, func, dim=None, axis=None, keep_attrs=False,
        if dim is not None and axis is not None:
            raise ValueError("cannot supply both 'axis' and 'dim' arguments")

+        if 'cum' in func.__name__:


I still don't like this approach. It feels very fragile.

A slightly saner approach would another function attribute like numeric_only that we use in ops.py. Then this check could be: keep_dims = getattr(func, 'keep_dims', False).

shoyer · 2016-09-21T21:01:44Z

We don't need to verify that every value is exactly as expected for Dataset/DataArray, but we should verify the general API (e.g., do at least one of cumsum/cumprod and make sure the result has the right dimensions and errors when it should). See the various test_reduce methods in test_dataset.py for examples.

pwolfram · 2016-09-29T19:16:49Z

@shoyer, I think this fixes the concerns you raised including the testing. Thanks for all the tips!

shoyer · 2016-09-29T23:28:51Z

Can you add a basic sanity check for DataArray.cumsum?

Others I think this just needs docs (on the What's New and API pages)

pwolfram · 2016-10-03T15:58:57Z

@shoyer, is this what you were thinking?

shoyer · 2016-10-03T16:14:24Z

doc/whats-new.rst

@@ -62,6 +62,9 @@ By `Robin Wilson <https://github.com/robintw>`_.
  overlapping (:issue:`835`) coordinates as long as any present data agrees.
  By `Johnnie Gray <https://github.com/jcmgray>`_.

+- Adds DataArray and Dataset methods :py:meth:`cumsum` and :py:meth:`cumprod`.


To make the links work, use, e.g.,

:py:meth:~DataArray.cumsum

Thanks @shoyer, this is fixed now and the link should now work following minor refractoring. However, a search for cumsum does not return the DataArray and Dataset results for my local test, which is very strange.

Needed until numpy v1.12, see numpy/numpy#7421

shoyer · 2016-10-03T21:03:58Z

doc/api.rst

@@ -145,6 +145,8 @@ Computation
 :py:attr:`~Dataset.round`
 :py:attr:`~Dataset.real`
 :py:attr:`~Dataset.T`
+:py:attr:`~Dataset.cumsum`


just a nit, these probably belong under the "Aggregation" heading above

shoyer · 2016-10-03T21:06:25Z

Thanks! Let's see how the docs look at http://xarray.pydata.org/en/latest/whats-new.html in a few minutes after the doc build completes

This was referenced Mar 31, 2016

cumprod returns errors #807

Closed

Adding cumsum / cumprod reduction operators #791

Closed

pwolfram mentioned this pull request Apr 1, 2016

Adds nancumsum and nancumprod dask/dask#1077

Closed

pwolfram force-pushed the add_cumsum_cumprod branch from 72b6e94 to 9ca86d4 Compare April 7, 2016 18:39

pwolfram force-pushed the add_cumsum_cumprod branch from 9ca86d4 to ce83d37 Compare September 20, 2016 14:39

pwolfram mentioned this pull request Sep 20, 2016

New function for applying vectorized functions for unlabeled arrays to xarray objects #964

Merged

pwolfram force-pushed the add_cumsum_cumprod branch 3 times, most recently from f9be915 to 9242854 Compare September 20, 2016 18:36

pwolfram force-pushed the add_cumsum_cumprod branch 3 times, most recently from 70ae93b to 13ace00 Compare September 20, 2016 19:50

shoyer requested changes Sep 21, 2016

View reviewed changes

shoyer reviewed Sep 21, 2016

View reviewed changes

pwolfram force-pushed the add_cumsum_cumprod branch 2 times, most recently from 52b737f to 4087bfc Compare September 21, 2016 20:32

shoyer reviewed Sep 21, 2016

View reviewed changes

pwolfram force-pushed the add_cumsum_cumprod branch from 4087bfc to 827ab77 Compare September 29, 2016 19:13

pwolfram force-pushed the add_cumsum_cumprod branch from 827ab77 to 6040fb7 Compare September 29, 2016 19:20

pwolfram force-pushed the add_cumsum_cumprod branch from 6040fb7 to dfbc090 Compare October 3, 2016 15:58

shoyer reviewed Oct 3, 2016

View reviewed changes

shoyer approved these changes Oct 3, 2016

View reviewed changes

Adds nancumsum, nancumprod for numpy compatability

428d859

Needed until numpy v1.12, see numpy/numpy#7421

pwolfram force-pushed the add_cumsum_cumprod branch from dfbc090 to 8817af5 Compare October 3, 2016 19:43

Adds nancumsum, nancumprod to xarray functions

129c807

pwolfram force-pushed the add_cumsum_cumprod branch from 8817af5 to 129c807 Compare October 3, 2016 20:05

shoyer reviewed Oct 3, 2016

View reviewed changes

shoyer merged commit 9cf107b into pydata:master Oct 3, 2016

pwolfram deleted the add_cumsum_cumprod branch October 3, 2016 21:11

Adds cummulative operators to API #812

Adds cummulative operators to API #812

Conversation

pwolfram commented Mar 31, 2016 • edited Loading

pwolfram commented Mar 31, 2016

shoyer commented Mar 31, 2016

pwolfram commented Apr 1, 2016

pwolfram commented Apr 7, 2016

jhamman commented May 11, 2016

pwolfram commented May 11, 2016

pwolfram commented Sep 20, 2016 • edited Loading

pwolfram commented Sep 20, 2016

shoyer Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwolfram Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwolfram commented Sep 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Sep 21, 2016

pwolfram commented Sep 29, 2016

shoyer commented Sep 29, 2016

pwolfram commented Oct 3, 2016

Choose a reason for hiding this comment

pwolfram Oct 3, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Oct 3, 2016

pwolfram commented Mar 31, 2016 •

edited

Loading

pwolfram commented Sep 20, 2016 •

edited

Loading

shoyer Sep 21, 2016 •

edited

Loading

pwolfram Sep 21, 2016 •

edited

Loading

pwolfram Oct 3, 2016 •

edited

Loading