-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
POC: New UDF methods option #43678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: New UDF methods option #43678
Conversation
{"A": [12, 12, 12], "B": [27, 27, 27]}, index=["sum", "<lambda>", "<lambda>"] | ||
) | ||
|
||
result = pdf.apply([np.square, lambda x: x, lambda x: x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from sum to square because lambda x: x.sum()
would fail for apply; this is now consistent with
pdf.apply(lambda x: x.sum())
which currently fails master.
} | ||
) | ||
if get_option("new_udf_methods"): | ||
result = string_series.agg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from apply to agg as now apply with a dictionary of ops will act the same way as apply with a single op - acting on each row of the Series individually.
if how == "apply": | ||
expected = DataFrame({name: string_series for name, op in zip(names, ops)}) | ||
else: | ||
expected = Series( | ||
{name: op(string_series) for name, op in zip(names, ops)}, name="series" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series.apply with a list of ops now acts in the same way as Series.apply with a single op, applying the UDF to each row of the Series. This is a no-op for sum/mean.
if get_option("new_udf_methods"): | ||
expected = Series([1, 1, 1, 3], index=index) | ||
expected.index.name = None | ||
tm.assert_series_equal(table, expected) | ||
else: | ||
expected = DataFrame(index=index) | ||
tm.assert_frame_equal(table, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the bottom of the PDF attached in the OP for some more details on this.
6bd8488
to
4acdb55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to split this :-> i see your comment
@@ -98,6 +102,7 @@ def frame_apply( | |||
result_type=result_type, | |||
args=args, | |||
kwargs=kwargs, | |||
renamer=renamer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would move before args
for a in arg: | ||
name = None | ||
try: | ||
if isinstance(a, (tuple, list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob worth having a function to compute a single arg (and then just call in the list)
@@ -408,6 +421,70 @@ def agg_list_like(self) -> DataFrame | Series: | |||
) | |||
return concatenated.reindex(full_ordered_index, copy=False) | |||
|
|||
def new_list_like(self, method: str) -> DataFrame | Series: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be something other than new
? i.e. a name that will still be helpful/accurate in 12 months
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - splitting this up, some discussion in the first part: #43736. "future" has been suggested as a replacement for "new".
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Start of #41112
The goal is to provide an implementation for apply on lists. See #41112 (comment) for why this necessitates a change to agg behavior for lists. It is possible to break this up into several PRs, but wanted to give a view as to what it would appear in totality.
Attached is a notebook demonstrating the differences in behavior (github doesn't allow attaching the notebook directly).
udfs.pdf