-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Adds cummulative operators to API #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mrocklin and @shoyer, will we need to modify the definitions for |
@pwolfram dask will also definitely need |
@shoyer see dask/dask#1077 for the |
72b6e94
to
9ca86d4
Compare
@shoyer, it looks like I'll need to have a f = _func_slash_method_wrapper(method, name)
setattr(cls, name, cls._binary_op(f)) Some advice or help to get me out of my naivete would be greatly appreciated. Thanks! |
@pwolfram - how's this going? Are we still stuck on the above question? |
@jhamman, I need to get back to this when I can make the time but will let you know if I have more trouble. Thanks for following up. |
9ca86d4
to
ce83d37
Compare
f9be915
to
9242854
Compare
@shoyer and @jhamman here is the general use of the cumsum and cumprod operators. Note, I probably need to have some type of error checking for the dask version (e.g., we require a version of dask with For example, dask 0.11.0 works but dask 0.8.1 does not and returns an error. |
70ae93b
to
13ace00
Compare
@shoyer, @jhamman, and @MaximilianR this should be ready for a preliminary review because it works. The key thing missing is a check for dask version and potentially more testing. Thoughts on these issues are greatly appreciated. |
@@ -893,16 +893,27 @@ def reduce(self, func, dim=None, axis=None, keep_attrs=False, | |||
if dim is not None and axis is not None: | |||
raise ValueError("cannot supply both 'axis' and 'dim' arguments") | |||
|
|||
if 'cum' in func.__name__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put these error checks in ops.cumsum
instead? It's poor separation of concerns to do these checks over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so unless there is a clever way to do this that I'm missing. It looks like I'd need to have something like a new _partial_reduce_method that is for cumsum, cumprod, prod, etc. However, this will require additional code lines beyond the simplistic approach I've taken. But this may be necessary to get this to work in general in a clean way. For instance, I don't think the existing code works properly with dataset yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer, this obviously changes the comments below because we need this to work for both dataset and dataarray and I need to verify the existing implementation does just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer, I double-checked. This appears to work for both dataarray and dataset but it would be good to have some tests to ensure this functionality works in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some tests and will push them soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't individual test dataarray and dataset functionality but assume it is inherited by variable. Hence, I've tested variable methods for cumsum and cumprod for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this approach. It feels very fragile.
A slightly saner approach would another function attribute like numeric_only
that we use in ops.py
. Then this check could be: keep_dims = getattr(func, 'keep_dims', False)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in favor of use of keep_dims
approach
if n not in removed_axes] | ||
|
||
if 'cum' in func.__name__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above -- let's keep Variable.reduce
unmodified for now. I think we'll want to actually implement this using xarray.apply
, anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I guess it's OK to keep this -- it is all we need to actually implement cumsum
, after all, but let's do the shape check unilaterally, without looking at the function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer, I had originally implemented it that way, but this introduced a bug- related to use of prod if I recall correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankfully, it appears to work without issue on re-examination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as you discussed below
return _prod(values, axis=axis, **kwargs) | ||
prod.numeric_only = True | ||
|
||
def cumsum(values, axis=None, skipna=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a wrapper function that generates these functions, instead of repeating the logic three times for prod
/cumsum
/cumprod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better than that-- I extended functionality to _create_nan_agg_method
to generalize, which removes quite a few lines of code. See d1c1077
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -274,7 +274,8 @@ def _ignore_warnings_if(condition): | |||
yield | |||
|
|||
|
|||
def _create_nan_agg_method(name, numeric_only=False, coerce_strings=False): | |||
def _create_nan_agg_method(name, numeric_only=False, np_compat=False, | |||
no_bottleneck=False, coerce_strings=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep indentation aligned with (
per PEP8
else: | ||
eager_module = bn | ||
func = _dask_or_eager_func(nanname, eager_module) | ||
using_numpy_nan_func = eager_module is np | ||
using_numpy_nan_func = (eager_module is np) or (eager_module is npcompat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit: is
has higher precedence than or
, so you don't need parentheses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shoyer!
if n not in removed_axes] | ||
|
||
if 'cum' in func.__name__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I guess it's OK to keep this -- it is all we need to actually implement cumsum
, after all, but let's do the shape check unilaterally, without looking at the function name.
if n not in removed_axes] | ||
|
||
if 'cum' in func.__name__: | ||
def safe_shape(val): | ||
return val.shape if type(val) is np.ndarray else () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be just getattr(val, 'shape', ())
(dask arrays have shape defined, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shoyer!
52b737f
to
4087bfc
Compare
@shoyer, this should be ready for another review. I also tested it with this somewhat hacky code at https://gist.github.com/a329d441fe99ae342a34b1a374650138. It may be good to get some type of test like this into the test suite. However, the correct location for testing these methods, in general, is not transparent to me. It doesn't look like we broadly check reduction operations with nans, e.g., |
def safe_shape(val): | ||
return getattr(val, 'shape', ()) | ||
|
||
if safe_shape(data) == safe_shape(self.data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done alternatively to calculating dims
a few lines above, e.g.,
if getattr(data, 'shape', ()) == self.shape:
dims = self.dims
else:
removed_axes = ...
dims = [adim for n, admin in enumerate(self.dims) ...]
Otherwise we calculate those dimensions just to throw them away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
@@ -893,16 +893,27 @@ def reduce(self, func, dim=None, axis=None, keep_attrs=False, | |||
if dim is not None and axis is not None: | |||
raise ValueError("cannot supply both 'axis' and 'dim' arguments") | |||
|
|||
if 'cum' in func.__name__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this approach. It feels very fragile.
A slightly saner approach would another function attribute like numeric_only
that we use in ops.py
. Then this check could be: keep_dims = getattr(func, 'keep_dims', False)
.
We don't need to verify that every value is exactly as expected for Dataset/DataArray, but we should verify the general API (e.g., do at least one of |
4087bfc
to
827ab77
Compare
@shoyer, I think this fixes the concerns you raised including the testing. Thanks for all the tips! |
827ab77
to
6040fb7
Compare
Can you add a basic sanity check for Others I think this just needs docs (on the What's New and API pages) |
6040fb7
to
dfbc090
Compare
@shoyer, is this what you were thinking? |
@@ -62,6 +62,9 @@ By `Robin Wilson <https://github.com/robintw>`_. | |||
overlapping (:issue:`835`) coordinates as long as any present data agrees. | |||
By `Johnnie Gray <https://github.com/jcmgray>`_. | |||
|
|||
- Adds DataArray and Dataset methods :py:meth:`cumsum` and :py:meth:`cumprod`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the links work, use, e.g.,
:py:meth:~DataArray.cumsum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shoyer, this is fixed now and the link should now work following minor refractoring. However, a search for cumsum
does not return the DataArray and Dataset results for my local test, which is very strange.
Needed until numpy v1.12, see numpy/numpy#7421
dfbc090
to
8817af5
Compare
8817af5
to
129c807
Compare
@@ -145,6 +145,8 @@ Computation | |||
:py:attr:`~Dataset.round` | |||
:py:attr:`~Dataset.real` | |||
:py:attr:`~Dataset.T` | |||
:py:attr:`~Dataset.cumsum` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, these probably belong under the "Aggregation" heading above
Thanks! Let's see how the docs look at http://xarray.pydata.org/en/latest/whats-new.html in a few minutes after the doc build completes |
This PR will add cumsum and cumprod as discussed in #791 as well ensuring
cumprod
works for the API, resolving issues discussed at #807.TO DO (dependencies)
nancumprod
andnancumsum
to numpy (ENH: adds np.nancumsum and np.nancumprod numpy/numpy#7421)nancumprod
andnancumsum
to dask (Adds nancumsum and nancumprod dask/dask#1077)This PR extends infrastructure to support
cumsum
andcumprod
(#791).References:
cc @shoyer, @jhamman