You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is pretty common to want to run cumsum and have the sum reset when a boolean flag array is 1. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)
We could introduce a new method DataArray.segmented_scan(flags, op="sum") or a new class DataArray.segment.cumsum()? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to our ffill, bfill wrappers).
In a way this generalizes resample and it just struck me that the example above could be written as the following, which should be OK once flox adds scans
@dcherian I feel like you're practically the only person who would have realized that this is expressible as (2) 😅
I like the idea of adding some kind of cumsum syntactic sugar, especially if the underlying implementation can be in terms of groupby so it doesn't add much maintenance burden.
Is your feature request related to a problem?
It is pretty common to want to run
cumsum
and have the sum reset when a boolean flag array is1
. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)Here's a real example of someone trying to implement it in a fairly roundabout way.
We have a few options to implement it:
We could introduce a new method
DataArray.segmented_scan(flags, op="sum")
or a new classDataArray.segment.cumsum()
? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to ourffill
,bfill
wrappers).In a way this generalizes
resample
and it just struck me that the example above could be written as the following, which should be OK once flox adds scansGrouper
functionality to expose a "flag" grouper that hides thegroup_idx = (cube == 0).cumsum('time')
line.My concern with (2) and (2.i) is that they are not at all obvious for most of our userbase.
The text was updated successfully, but these errors were encountered: