-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: rolling with Int64 #43174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: rolling with Int64 #43174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm excluding mypy error
pandas/core/window/rolling.py
Outdated
@@ -318,7 +318,11 @@ def _prep_values(self, values: ArrayLike) -> np.ndarray: | |||
# GH #12373 : rolling functions error on float32 data | |||
# make sure the data is coerced to float64 | |||
try: | |||
values = ensure_float64(values) | |||
if hasattr(values, "to_numpy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should change ensure_float64 to just do this directly (eg put your change there) - need to check perf and that this doesn't break anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do isinstance instead of hasattr?
# GH 43016 | ||
s = Series([0, 1, NA], dtype=any_signed_int_ea_dtype) | ||
result = s.rolling(2).mean() | ||
expected = Series([np.nan, 0.5, np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we actually replace and use pd.NA as the output? or is this too big of a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, but currently all rolling/expanding/ewm results always return np.float64
(which is documented as well). So that would be a big change to return the same dtype as the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, do we have an issue about this? we should make this change in 1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm no was speaking about supporting pd.NA specifically (i think rolling_apply is totally fine) if you can add one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified that issue to generally support same input/output dtypes (which include ExtensionDtypes that support pd.NA)
@@ -66,6 +66,10 @@ def ensure_{{name}}(object arr, copy=True): | |||
return arr | |||
else: | |||
return arr.astype(np.{{dtype}}, copy=copy) | |||
{{if na_val == "nan"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im really not sure about this. ensure_foo is fast ATM, plus this introduces circular dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think ensure_foo
should not be used directly on ExtentionArrays i.e. ensure_foo(arr.to_numpy(...))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess. mainly i think that pd.NA is way more trouble than its worth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i did request this, but agree the performance tradeoff might not be worthit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think maybe making an ensure_*_with_na is prob better here (and make it the caller responsible if we have an EA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with @jbrockmendel's original isinstance
check instead for now. For the ensure_foo_with_na
feature, probably should be another issue to discuss because it would be potentially useful anywhere ensure_foo
is called.
# GH 43016 | ||
s = Series([0, 1, NA], dtype=any_signed_int_ea_dtype) | ||
result = s.rolling(2).mean() | ||
expected = Series([np.nan, 0.5, np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, do we have an issue about this? we should make this change in 1.4
@@ -66,6 +66,10 @@ def ensure_{{name}}(object arr, copy=True): | |||
return arr | |||
else: | |||
return arr.astype(np.{{dtype}}, copy=copy) | |||
{{if na_val == "nan"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i did request this, but agree the performance tradeoff might not be worthit.
thanks @mroeschke |
FWIW:
The absolute cost is very small, but its optimized to the bone. |
I think this also fixed my issue in #43381. Any chance it can be backported to the 1.3.x branch? |
We backport only regression fixes |
rolling
fails whenSeries
containspd.NA
#43016