-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1. Just noting that DataFrameGroupBy.transform is not a concern here when it comes to API consistency. Users shouldn't be using groupby at all if the transform acts element-wise, and so only vectored transforms should be used in groupby as well. |
I think that what you are proposing is the following:
But what happens with
|
It will be like that, though if you go through the code there are some shape checks etc. For axis= 1, it does a double transform, see |
and to add, The list/dicts of callables however currently operate on the elements and the proposal is that they will operate on the column series, like the single callable case. |
@MarcoGorelli any comments? I think you were positive in #54747 (comment). [Fixed link - rhshadrach] |
the idea looks good regarding |
I think it should have signature |
this could be a bit noisy - is the code implemented using a fallback, such as try:
transform_series_wise(...)
except:
transform_element_wise(...) ? If so, could the deprecation warning only go in the |
The is that those give the same results in some cases and not in others, e.g.:
give the same results, while
would not give the same results. We could, if the element-wise operations works, try the series-wise operation also and if the results compare equal, not emit the error message. That would cut down a lot of the error messages, but have some cost performance-wise. Note that the series-wise operation and the comparisons will be much faster than the element-wise operation, so perf. cost may not be too bad (and will be temporary for the v2.x cycle). |
ah so the element-wise one is tried first? (sorry, really not familiar with this part of the code base) in that case, +1 to the series_ops_only suggestion |
Yeah, the element-wise one is tried first, unfortunately. |
For consistency, does it make sense to use |
A discussion has been going on in #54747 (PDEP 13) about making
Series.transform
andDataFrame.transform
always operate on Series. See #54747 (comment) and related comments. Opening a separate issue to separate that discussion from PDEP 13/#54747.Currently,
Series.transform
tries to operates on series element and if that fails it tries operating on the series. So it uses a fallback mechanism, which makes it difficult to use + the first choice (element-wise operations) is very slow.DataFrame.transform
operates on series (i.e. columns/rows) when given callables, but operates on elements, when given lists or dicts of callables, which is inconsistent. Examples:All in all, the above is very inconsistent and difficult to reason about for users, similarly to the discussion regarding
apply
in #54747/PDEP 13.I propose to deprecate element-wise operations in
(Series|DataFrame).transform
, so in Pandas v3.0 giving callables (and lists/dicts of callables) to(Series|DataFrame).transform
always operates on series. The benefit of this is that the(Series|DataFrame).transform
method will become much more predictable and faster. When users want to do element-wise operations, they should be directed to use(Series|DataFrame).map
. So no functionality is lost, but we get clearer separation between series-wise and element-wise operations.The deprecation is proposed implemented in pandas v2.2, where we add a new keyword parameter
series_ops_only
to(Series|DataFrame).transform
. When set to true,(Series|DataFrame).transform
will always operate on the whole series. When False, the old behavior will be kept, and a deprecation warning will be emitted. In pandas v3.0, the old behavior will be removed and(Series|DataFrame).transform
will only operate on series.Related issues:
agg
, already implemented)The text was updated successfully, but these errors were encountered: