Skip to content

Add Scalar.parent_dataframe #333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2023

Conversation

MarcoGorelli
Copy link
Contributor

similar to Column.parent_dataframe

@kkraus14
Copy link
Collaborator

kkraus14 commented Dec 7, 2023

Given there's no concern of ordering like in DataFrames / Columns, why would binary operations across scalars yielded from different Tables not work? I.E. getting the max of a column from one table and the max of a column from a different table and comparing them feels pretty reasonable to me.

@MarcoGorelli
Copy link
Contributor Author

in polars they'd be backed by expressions (where the output is a one-row column), and those need to refer to the same parent dataframe (so, same story as with cross-dataframe column comparisons being implementation-specific)

@MarcoGorelli
Copy link
Contributor Author

I don't really have any choice but to make this raise in dataframe-api-compat - I think it'd be good for other packages if we noted this in the standard

@kkraus14
Copy link
Collaborator

I guess we can add this and other libraries can always have the parent DataFrame set as None to allow cross comparisons?

It kinda goes against the spirit of the API being universal across implementations, but I would be somewhat frustrated as both a user and a DataFrame library maintainer to not be able to compare scalars from different DataFrames.

@MarcoGorelli
Copy link
Contributor Author

You were OK with specifying that cross-dataframe column comparisons are implementation-specific, what makes the scalar case different?

@MarcoGorelli
Copy link
Contributor Author

Just to clarify a couple of things:

I would be somewhat frustrated as both a user

would you not be more frustrated to find that a library which you thought was dataframe agnostic ends up raising for the most popular dataframe libraries?

and a DataFrame library maintainer

this shouldn't change anything to you as a maintainer. You don't need to change the behaviour of any dataframe library you maintain, as I'm just suggesting to note that this behaviour is implementation-specific. It's not disallowed or anything. You're more than free to keep allowing this

@kkraus14
Copy link
Collaborator

You were OK with specifying that cross-dataframe column comparisons are implementation-specific, what makes the scalar case different?

There's a logical reason for why cross-dataframe column comparisons may be invalid. Ordering may be "undefined" within a dataframe but the columns within a dataframe are guaranteed to have the same "undefined" order whereas across dataframes they are not. Some libraries will guarantee ordering of operations and can always cross-compare columns from different dataframes in a logically consistent way, others cannot and often when they cannot they raise rather than giving undefined results.

There's no ordering when it comes to scalars, so as far as I can tell there's not a logical reason to disallow comparison between arbitrary scalars. The reason for us doing so here is because some implementations like Polars and Ibis disallow it.

Regardless, implementations not supporting it is a more than good enough reason to move forward with this here.

@MarcoGorelli MarcoGorelli merged commit c5f0835 into data-apis:main Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants