BUG: rolling with Int64 #43174

mroeschke · 2021-08-23T03:32:35Z

closes BUG: rolling fails when Series contains pd.NA #43016
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

phofl

lgtm excluding mypy error

jreback · 2021-08-23T13:10:07Z

pandas/core/window/rolling.py

@@ -318,7 +318,11 @@ def _prep_values(self, values: ArrayLike) -> np.ndarray:
            # GH #12373 : rolling functions error on float32 data
            # make sure the data is coerced to float64
            try:
-                values = ensure_float64(values)
+                if hasattr(values, "to_numpy"):


we should change ensure_float64 to just do this directly (eg put your change there) - need to check perf and that this doesn't break anything

can do isinstance instead of hasattr?

…nt_rolling

jreback · 2021-08-26T23:58:44Z

pandas/tests/window/test_dtypes.py

+    # GH 43016
+    s = Series([0, 1, NA], dtype=any_signed_int_ea_dtype)
+    result = s.rolling(2).mean()
+    expected = Series([np.nan, 0.5, np.nan])


shouldn't we actually replace and use pd.NA as the output? or is this too big of a change?

Ideally, but currently all rolling/expanding/ewm results always return np.float64 (which is documented as well). So that would be a big change to return the same dtype as the caller.

ok, do we have an issue about this? we should make this change in 1.4

hmm no was speaking about supporting pd.NA specifically (i think rolling_apply is totally fine) if you can add one

I modified that issue to generally support same input/output dtypes (which include ExtensionDtypes that support pd.NA)

jbrockmendel · 2021-08-27T01:38:57Z

pandas/_libs/algos_common_helper.pxi.in

@@ -66,6 +66,10 @@ def ensure_{{name}}(object arr, copy=True):
            return arr
        else:
            return arr.astype(np.{{dtype}}, copy=copy)
+{{if na_val == "nan"}}


im really not sure about this. ensure_foo is fast ATM, plus this introduces circular dependency

Do you think ensure_foo should not be used directly on ExtentionArrays i.e. ensure_foo(arr.to_numpy(...))?

i guess. mainly i think that pd.NA is way more trouble than its worth

hmm i did request this, but agree the performance tradeoff might not be worthit.

i think maybe making an ensure_*_with_na is prob better here (and make it the caller responsible if we have an EA)

I went with @jbrockmendel's original isinstance check instead for now. For the ensure_foo_with_na feature, probably should be another issue to discuss because it would be potentially useful anywhere ensure_foo is called.

jreback · 2021-08-30T15:15:49Z

pandas/tests/window/test_dtypes.py

+    # GH 43016
+    s = Series([0, 1, NA], dtype=any_signed_int_ea_dtype)
+    result = s.rolling(2).mean()
+    expected = Series([np.nan, 0.5, np.nan])


ok, do we have an issue about this? we should make this change in 1.4

jreback · 2021-08-30T15:17:01Z

pandas/_libs/algos_common_helper.pxi.in

@@ -66,6 +66,10 @@ def ensure_{{name}}(object arr, copy=True):
            return arr
        else:
            return arr.astype(np.{{dtype}}, copy=copy)
+{{if na_val == "nan"}}


hmm i did request this, but agree the performance tradeoff might not be worthit.

…nt_rolling

jreback · 2021-08-31T19:54:27Z

thanks @mroeschke

jbrockmendel · 2021-08-31T21:24:24Z

FWIW:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: from pandas.core.dtypes.common import *

In [4]: arr = np.arange(5, dtype=np.int64)

In [5]: %timeit np.asarray(arr, dtype=np.int64)
150 ns ± 14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit arr.astype(np.int64, copy=False)
181 ns ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit ensure_int64(arr)
63.1 ns ± 8.89 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The absolute cost is very small, but its optimized to the bone.

janlugt · 2021-09-06T08:55:54Z

I think this also fixed my issue in #43381. Any chance it can be backported to the 1.3.x branch?

phofl · 2021-09-06T10:54:12Z

We backport only regression fixes

BUG: rolling with Int64

f84c520

mroeschke added the Window rolling, ewma, expanding label Aug 23, 2021

mroeschke added this to the 1.4 milestone Aug 23, 2021

phofl approved these changes Aug 23, 2021

View reviewed changes

jreback reviewed Aug 23, 2021

View reviewed changes

mroeschke added 2 commits August 26, 2021 14:37

Merge remote-tracking branch 'upstream/master' into bug/series_null_i…

bc4a45e

…nt_rolling

Move logic into ensure_float64

57b0ceb

jreback requested changes Aug 26, 2021

View reviewed changes

jreback added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Aug 26, 2021

jbrockmendel reviewed Aug 27, 2021

View reviewed changes

jreback requested changes Aug 30, 2021

View reviewed changes

mroeschke added 2 commits August 30, 2021 22:52

Use isinstance check instead

77416cc

Merge remote-tracking branch 'upstream/master' into bug/series_null_i…

3594d79

…nt_rolling

jreback approved these changes Aug 31, 2021

View reviewed changes

jreback merged commit 344c691 into pandas-dev:master Aug 31, 2021

mroeschke deleted the bug/series_null_int_rolling branch September 1, 2021 04:48

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021

BUG: rolling with Int64 (pandas-dev#43174)

5c024e2

mroeschke mentioned this pull request Nov 2, 2021

BUG (?): Some rolling window calculations do not work on Int64Dtype Series containing pd.NA #44291

Closed

3 tasks

simonjayhawkins mentioned this pull request Nov 10, 2021

BUG: rolling() function does not work with Float64 columns with missing values #43381

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: rolling with Int64 #43174

BUG: rolling with Int64 #43174

mroeschke commented Aug 23, 2021

phofl left a comment

jreback Aug 23, 2021

jbrockmendel Aug 23, 2021

jreback Aug 26, 2021

mroeschke Aug 27, 2021

jreback Aug 30, 2021

mroeschke Aug 31, 2021

jreback Aug 31, 2021

mroeschke Sep 1, 2021

jbrockmendel Aug 27, 2021

mroeschke Aug 27, 2021

jbrockmendel Aug 27, 2021

jreback Aug 30, 2021

jreback Aug 30, 2021

mroeschke Aug 31, 2021

jreback Aug 30, 2021

jreback Aug 30, 2021

jreback commented Aug 31, 2021

jbrockmendel commented Aug 31, 2021 •

edited

Loading

janlugt commented Sep 6, 2021

phofl commented Sep 6, 2021

BUG: rolling with Int64 #43174

BUG: rolling with Int64 #43174

Conversation

mroeschke commented Aug 23, 2021

phofl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 31, 2021

jbrockmendel commented Aug 31, 2021 • edited Loading

janlugt commented Sep 6, 2021

phofl commented Sep 6, 2021

jbrockmendel commented Aug 31, 2021 •

edited

Loading