-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: dont set ndarray.data in libreduction #34997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tentatively looks like the issue is that this patches |
If it helps I've noticed that changing this line: pandas/pandas/_libs/reduction.pyx Line 357 in 42fd7e7
To Seems heading in the right direction though |
closing to clear the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel made a few updates to get failures down to a small handful. may or may not be the right way of doing things but hopefully continues this conversation
Of the four remaining failures one of them is a result of groupby.tshift which maybe we will deprecate in #34452 |
Well the optimism is that we are down to a handful of failures :-) I think though that this is uncovering deep seeded issues in other parts of the code base. For instance, this is causing the test that #33439 added to fail which right now I've traced back to a potential bug somewhere in our hashing ocde, but I think it's only happen stance that that test works in the first place. Not sure if the other 4 failures are related to hashing as well but will hopefully figure out soon |
I've noticed that as an apply slides along various groups that this line of code looks problematic with the current changes: Line 231 in 3b1d4f1
The problem is that the call to @jbrockmendel any idea how the caching should be handled here? In the Python space the above call traces back to here: pandas/pandas/core/indexes/base.py Line 556 in 3b1d4f1
|
Here's the MRE I'm using to tackle the above comment. df = pd.DataFrame({"A": ["S", "W", "W"], "B": [1.0, 1.0, 2.0]})
res = df.groupby("A").agg({"B": lambda x: x.get(x.index[-1])}) On master this yields B
A
S 1.0
W 2.0 But with this PR yields B
A
S 1.0
W NaN Because of the aforementioned KeyError when trying to locate elements by index in the second grouping |
I'm not sure I follow. In index.pyx this is a call to
|
Where is the call to |
I expected to see it in Index._engine, but apparently not |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Just pushed after applying a patch from #35417. Locally that gets me to 3 test failures (which now that i reread the thread may not actually be an improvement) |
Co-authored-by: Matt Roeschke <[email protected]>
206a997
to
070481c
Compare
Updated to implement NDFrame._can_use_libreduction to de-duplicate a bunch of similar (and inconsistently strict) checks done elsewhere. (Note an inline-comment I'll make about a place where this PR retains a different, inconsistent check) This fixes an xfailed tests.groupby.test_apply.test_apply_with_timezones_aware |
# see see test_groupby.test_basic | ||
result = self._aggregate_named(func, *args, **kwargs) | ||
if isinstance( | ||
self._selected_obj.index, (DatetimeIndex, TimedeltaIndex, PeriodIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the one place where i'm not using self._selected_obj._can_use_libreduction
, as doing so would require 2.5 more kludges to get the tests passing:
- below on 283 after ret = create_series_with_explicit_dtype would need to do
# Inference in the Series constructor may not infer
# custom EA dtypes, so try here
ret = maybe_cast_result(ret._values, obj, numeric_only=True)
ret = Series(ret, index=index, name=obj.name)
1.5) in create_series_with_explicit_dtype would need to change dtype_if_empty=object to dtype_if_empty=obj.dtype (which im not 100% sure about)
- The kludge here would have to be amended from
and name in output.index
toand (name in output.index or 0 in output.index)
Closing. This doesn't do quite what I thought it did, will need a new approach, xref #36459. |
This thing is super thorny...nice effort in any case here. We will figure it out one of these days |
cc @WillAyd I'm still seeing 88 test failures locally and could use a fresh pair of eyes on this. Any thoughts?