DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

topper-123 · 2023-08-31T16:22:02Z

A discussion has been going on in #54747 (PDEP 13) about making Series.transform and DataFrame.transform always operate on Series. See #54747 (comment) and related comments. Opening a separate issue to separate that discussion from PDEP 13/#54747.

Currently, Series.transform tries to operates on series element and if that fails it tries operating on the series. So it uses a fallback mechanism, which makes it difficult to use + the first choice (element-wise operations) is very slow. DataFrame.transform operates on series (i.e. columns/rows) when given callables, but operates on elements, when given lists or dicts of callables, which is inconsistent. Examples:

>>> df = pd.DataFrame({"x":range(100_000)})
>>> %timeit df["x"].transform(lambda x: x + 1) # operates on elements, slow
15.5 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['x'].transform(np.sin) # ufunc, fast
784 µs ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit df['x'].transform(lambda x: np.sin(x)). # non-ufunc, operates on elements, slow
86.6 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df.transform(lambda x: x + 1)  # operates on the columns/series, fast
142 µs ± 589 ns per loop
>>> %timeit df.transform([lambda x: x + 1])  # lists/dicts operate on the elements, slow
16.7 ms ± 165 µs per loop

All in all, the above is very inconsistent and difficult to reason about for users, similarly to the discussion regarding apply in #54747/PDEP 13.

I propose to deprecate element-wise operations in (Series|DataFrame).transform, so in Pandas v3.0 giving callables (and lists/dicts of callables) to (Series|DataFrame).transform always operates on series. The benefit of this is that the (Series|DataFrame).transform method will become much more predictable and faster. When users want to do element-wise operations, they should be directed to use (Series|DataFrame).map. So no functionality is lost, but we get clearer separation between series-wise and element-wise operations.

The deprecation is proposed implemented in pandas v2.2, where we add a new keyword parameter series_ops_only to (Series|DataFrame).transform. When set to true, (Series|DataFrame).transform will always operate on the whole series. When False, the old behavior will be kept, and a deprecation warning will be emitted. In pandas v3.0, the old behavior will be removed and (Series|DataFrame).transform will only operate on series.

Related issues:

PDEP-13: Deprecate the apply method on Series and DataFrame and make the agg and transform methods operate on series data #54747
API: make the func in Series.apply always operate on the Series #52140
DEPR: make Series.agg aggregate when possible #53325 (similar issue for agg, already implemented)
API: Signature of UDF methods #40112

The text was updated successfully, but these errors were encountered:

rhshadrach · 2023-08-31T20:32:07Z

+1. Just noting that DataFrameGroupBy.transform is not a concern here when it comes to API consistency. Users shouldn't be using groupby at all if the transform acts element-wise, and so only vectored transforms should be used in groupby as well.

Dr-Irv · 2023-09-04T15:36:52Z

I think that what you are proposing is the following:

When s is a Series, s.transform(func) will be equivalent to func(s)
When df is a DataFrame, df.transform(func) is equivalent to pd.DataFrame({c: func(df[c]) for c in df.columns})

But what happens with df.transform(func, axis=1)? This has never been clear to me, but I think what you're proposing is

When df is a DataFrame, df.transform(func, axis=1) is equivalent to pd.DataFrame.from_rows([func(r) for _, r in df.iterrows()], index=df.index) (Note - I may have the use if iterrows() wrong here)

topper-123 · 2023-09-04T17:19:43Z

It will be like that, though if you go through the code there are some shape checks etc.

For axis= 1, it does a double transform, see Apply.transform. This proposal does not change that behavior.

topper-123 · 2023-09-04T17:25:14Z

and to add, The df.transform(func) case (single callable) already does ops on column series, and that will not change from this. Likewise, df.transform(func, axis=1) will not be affected from this.

list/dicts of callables however currently operate on the elements and the proposal is that they will operate on the column series, like the single callable case.

topper-123 · 2023-09-20T08:10:14Z

@MarcoGorelli any comments? I think you were positive in #54747 (comment).

[Fixed link - rhshadrach]

MarcoGorelli · 2023-09-20T14:34:45Z

the idea looks good

regarding series_ops_only - what will the default be?
say I'm currently using transform to operate on the whole series - will I get any warning, or will I need to pass series_ops_only=True?

topper-123 · 2023-09-20T17:30:51Z

I think it should have signature series_ops_only : bool | NoDefault = no_default, and work like normal for a deprecation? I.e. in the 2.x cycle everything works unchanged but gets a FutureWarning by default and if the user sets series_ops_only = True we guarantee series-wise operations. If the user sets series_ops_only = False they get the current behavior without DeprecationWarning.

MarcoGorelli · 2023-10-01T13:01:22Z

this could be a bit noisy - is the code implemented using a fallback, such as

try:
    transform_series_wise(...)
except:
    transform_element_wise(...)

?

If so, could the deprecation warning only go in the except block, and there'd be no need for a series_ops_only arg?

topper-123 · 2023-10-01T20:44:12Z

The is that those give the same results in some cases and not in others, e.g.:

elementwise: ser.map(lambda x: x + 1) (like ser.transform now)
series-wise ser.pipe(lambda x: x + 1). (like ser.transform proposed for v.30)

give the same results, while

elementwise: ser.map(lambda x: type(x)) (like ser.transform now)
series-wise ser.pipe(lambda x: type(x)). (like ser.transform proposed for v.30)

would not give the same results.

We could, if the element-wise operations works, try the series-wise operation also and if the results compare equal, not emit the error message. That would cut down a lot of the error messages, but have some cost performance-wise. Note that the series-wise operation and the comparisons will be much faster than the element-wise operation, so perf. cost may not be too bad (and will be temporary for the v2.x cycle).

MarcoGorelli · 2023-10-01T20:56:44Z

ah so the element-wise one is tried first? (sorry, really not familiar with this part of the code base)

in that case, +1 to the series_ops_only suggestion

topper-123 · 2023-10-01T21:03:02Z

Yeah, the element-wise one is tried first, unfortunately.

rhshadrach · 2023-10-04T01:56:03Z

For consistency, does it make sense to use by_rows here too? We've already added that to apply.

rhshadrach added Apply Apply, Aggregate, Transform, Map Transformations e.g. cumsum, diff, rank Deprecate Functionality to remove in pandas labels Aug 31, 2023

This was referenced Aug 31, 2023

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54915

Closed

PDEP-13: Deprecate the apply method on Series and DataFrame and make the agg and transform methods operate on series data #54747

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

topper-123 commented Aug 31, 2023 •

edited

Loading

rhshadrach commented Aug 31, 2023 •

edited

Loading

Dr-Irv commented Sep 4, 2023

topper-123 commented Sep 4, 2023

topper-123 commented Sep 4, 2023

topper-123 commented Sep 20, 2023 •

edited by rhshadrach

Loading

MarcoGorelli commented Sep 20, 2023

topper-123 commented Sep 20, 2023

MarcoGorelli commented Oct 1, 2023

topper-123 commented Oct 1, 2023 •

edited

Loading

MarcoGorelli commented Oct 1, 2023

topper-123 commented Oct 1, 2023

rhshadrach commented Oct 4, 2023

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

Comments

topper-123 commented Aug 31, 2023 • edited Loading

rhshadrach commented Aug 31, 2023 • edited Loading

Dr-Irv commented Sep 4, 2023

topper-123 commented Sep 4, 2023

topper-123 commented Sep 4, 2023

topper-123 commented Sep 20, 2023 • edited by rhshadrach Loading

MarcoGorelli commented Sep 20, 2023

topper-123 commented Sep 20, 2023

MarcoGorelli commented Oct 1, 2023

topper-123 commented Oct 1, 2023 • edited Loading

MarcoGorelli commented Oct 1, 2023

topper-123 commented Oct 1, 2023

rhshadrach commented Oct 4, 2023

topper-123 commented Aug 31, 2023 •

edited

Loading

rhshadrach commented Aug 31, 2023 •

edited

Loading

topper-123 commented Sep 20, 2023 •

edited by rhshadrach

Loading

topper-123 commented Oct 1, 2023 •

edited

Loading