-
Notifications
You must be signed in to change notification settings - Fork 21
Add Scalar.parent_dataframe #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Scalar.parent_dataframe #333
Conversation
Given there's no concern of ordering like in DataFrames / Columns, why would binary operations across scalars yielded from different Tables not work? I.E. getting the max of a column from one table and the max of a column from a different table and comparing them feels pretty reasonable to me. |
in polars they'd be backed by expressions (where the output is a one-row column), and those need to refer to the same parent dataframe (so, same story as with cross-dataframe column comparisons being implementation-specific) |
I don't really have any choice but to make this raise in |
I guess we can add this and other libraries can always have the parent DataFrame set as It kinda goes against the spirit of the API being universal across implementations, but I would be somewhat frustrated as both a user and a DataFrame library maintainer to not be able to compare scalars from different DataFrames. |
You were OK with specifying that cross-dataframe column comparisons are implementation-specific, what makes the scalar case different? |
Just to clarify a couple of things:
would you not be more frustrated to find that a library which you thought was dataframe agnostic ends up raising for the most popular dataframe libraries?
this shouldn't change anything to you as a maintainer. You don't need to change the behaviour of any dataframe library you maintain, as I'm just suggesting to note that this behaviour is implementation-specific. It's not disallowed or anything. You're more than free to keep allowing this |
There's a logical reason for why cross-dataframe column comparisons may be invalid. Ordering may be "undefined" within a dataframe but the columns within a dataframe are guaranteed to have the same "undefined" order whereas across dataframes they are not. Some libraries will guarantee ordering of operations and can always cross-compare columns from different dataframes in a logically consistent way, others cannot and often when they cannot they raise rather than giving undefined results. There's no ordering when it comes to scalars, so as far as I can tell there's not a logical reason to disallow comparison between arbitrary scalars. The reason for us doing so here is because some implementations like Polars and Ibis disallow it. Regardless, implementations not supporting it is a more than good enough reason to move forward with this here. |
similar to Column.parent_dataframe