PX can accept non-pandas dataframes that can .to_pandas() #3901

nicolaskruchten · 2022-09-25T11:34:43Z

No description provided.

exactlyallan · 2022-10-12T23:37:40Z

Regarding accepting RAPIDS cuDF dataframes, this is great at reducing friction, but we lose transparency about moving off GPU to system memory that an explicit .to_pandas() shows. For single charts (most of PX's usage?) that really isnt an issue, but I'm wondering how much slower a Dash + PX dashboard would be without the tighter cuDF integration.

nicolaskruchten · 2022-10-13T00:57:32Z

There's not really any tighter integration to be had atm... PX doesn't do any aggregation anyway, it just splits and dumps dataframe columns into JSON basically so it's coming off the GPU no matter what.

The major inefficiency in my PR is that it takes the entire df to pandas, not just the columns it needs, which would be a better approach but potentially trickier/further down into the code.

kevinheavey · 2023-01-05T10:35:24Z

The other drawback of relying on to_pandas is it cements Pandas as a dependency

nicolaskruchten · 2023-01-05T13:46:45Z

Ah well right now the dependency is extremely cemented internally, which is part of what this change would try to hide from the user.

LiamConnors

💃

nicolaskruchten · 2023-06-08T15:16:38Z

@LiamConnors itll be important to mention in the docs my comment above re "copying the whole data frame"!

MarcoGorelli · 2023-06-14T14:24:25Z

Ah well right now the dependency is extremely cemented internally, which is part of what this change would try to hide from the user.

Hey @nicolaskruchten - just FYI, there is an initiative which could help with this: https://data-apis.org/blog/dataframe_standard_rfc/

Just to gauge interest, is this something you might consider leveraging in the future once it's ready?

nicolaskruchten · 2023-06-14T16:37:51Z

definitely! PX does all sorts of manipulation under the hood which is pretty tightly-coupled to The Way Pandas Does Things™ but only out of necessity. If it turns out all of these can be faithfully ported to The Standard Way™ then I'm all for it. To be clear though: none of this is doing any data processing that can be sped up by non-Pandas backends, it's just a bunch of slicing and column rearranging work. PX does essentially no aggregation/math of any kind with Pandas, so the idea of accepting non-Pandas input is just an ergonomic one at this point to avoid the user having to do the .to_pandas() themselves.

There's another PR in flight to leverage the interchange approach on top of the to_pandas() approach: #4244

MarcoGorelli · 2023-06-14T16:48:13Z

thanks @nicolaskruchten - the interchange protocol isn't necessarily zero-copy (it just tries to be when possible), so it may still be more efficient to write library-agnostic code

I'll try it out in plotly and see how far it goes

nicolaskruchten · 2023-06-14T17:18:03Z

👍

probably the biggest win right now would be to .to_pandas() or to use the interchange to copy over less than the entire set of columns, which is what's happening right now and can be super-inefficient in the case of e.g. a very wide cudf thing where we haul the entire thing off the GPU just for two columns!

anmyachev · 2023-06-26T15:08:22Z

CHANGELOG.md

@@ -17,6 +16,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
   this feature was anonymously sponsored: thank you to our sponsor!
    - Add `legend.xref` and `legend.yref` to enable container-referenced positioning of legends [[#6589](https://github.com/plotly/plotly.js/pull/6589)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development.
    - Add `colorbar.xref` and `colorbar.yref` to enable container-referenced positioning of colorbars [[#6593](https://github.com/plotly/plotly.js/pull/6593)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development.
+  - `px` methods now accept data-frame-like objects that support a `to_pandas()` method, such as polars, cudf, vaex etc


@nicolaskruchten Looks like vaex doesn't have to_pandas method, only to_pandas_df.

Therefore, tests with vaex do not pass in #4244.

PX can accept non-pandas dataframes that can .to_pandas()

8b58c7f

nicolaskruchten mentioned this pull request Sep 25, 2022

support dataframe protocol (tested with Vaex) #3387

Closed

Merge branch 'master' into px_to_pandas

0e0eaa8

LiamConnors approved these changes Jun 8, 2023

View reviewed changes

alexcjohnson merged commit 69b4d3e into master Jun 8, 2023

alexcjohnson deleted the px_to_pandas branch June 8, 2023 14:33

anmyachev reviewed Jun 26, 2023

View reviewed changes

LiamConnors mentioned this pull request Jul 7, 2023

Docs updates for dataframe support in Plotly Express #4272

Merged

23 tasks

MarcoGorelli mentioned this pull request Jul 12, 2023

Add DataFrame.unique_indices data-apis/dataframe-api#194

Merged

MarcoGorelli mentioned this pull request Jul 19, 2023

only interchange necessary columns #4286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PX can accept non-pandas dataframes that can .to_pandas() #3901

PX can accept non-pandas dataframes that can .to_pandas() #3901

nicolaskruchten commented Sep 25, 2022

exactlyallan commented Oct 12, 2022

nicolaskruchten commented Oct 13, 2022

kevinheavey commented Jan 5, 2023

nicolaskruchten commented Jan 5, 2023

LiamConnors left a comment

nicolaskruchten commented Jun 8, 2023

MarcoGorelli commented Jun 14, 2023

nicolaskruchten commented Jun 14, 2023

MarcoGorelli commented Jun 14, 2023

nicolaskruchten commented Jun 14, 2023

anmyachev Jun 26, 2023

PX can accept non-pandas dataframes that can .to_pandas() #3901

PX can accept non-pandas dataframes that can .to_pandas() #3901

Conversation

nicolaskruchten commented Sep 25, 2022

exactlyallan commented Oct 12, 2022

nicolaskruchten commented Oct 13, 2022

kevinheavey commented Jan 5, 2023

nicolaskruchten commented Jan 5, 2023

LiamConnors left a comment

Choose a reason for hiding this comment

nicolaskruchten commented Jun 8, 2023

MarcoGorelli commented Jun 14, 2023

nicolaskruchten commented Jun 14, 2023

MarcoGorelli commented Jun 14, 2023

nicolaskruchten commented Jun 14, 2023

anmyachev Jun 26, 2023

Choose a reason for hiding this comment