-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
PX can accept non-pandas dataframes that can .to_pandas() #3901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Regarding accepting RAPIDS cuDF dataframes, this is great at reducing friction, but we lose transparency about moving off GPU to system memory that an explicit |
There's not really any tighter integration to be had atm... PX doesn't do any aggregation anyway, it just splits and dumps dataframe columns into JSON basically so it's coming off the GPU no matter what. The major inefficiency in my PR is that it takes the entire df to pandas, not just the columns it needs, which would be a better approach but potentially trickier/further down into the code. |
The other drawback of relying on |
Ah well right now the dependency is extremely cemented internally, which is part of what this change would try to hide from the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💃
@LiamConnors itll be important to mention in the docs my comment above re "copying the whole data frame"! |
Hey @nicolaskruchten - just FYI, there is an initiative which could help with this: https://data-apis.org/blog/dataframe_standard_rfc/ Just to gauge interest, is this something you might consider leveraging in the future once it's ready? |
definitely! PX does all sorts of manipulation under the hood which is pretty tightly-coupled to The Way Pandas Does Things™ but only out of necessity. If it turns out all of these can be faithfully ported to The Standard Way™ then I'm all for it. To be clear though: none of this is doing any data processing that can be sped up by non-Pandas backends, it's just a bunch of slicing and column rearranging work. PX does essentially no aggregation/math of any kind with Pandas, so the idea of accepting non-Pandas input is just an ergonomic one at this point to avoid the user having to do the There's another PR in flight to leverage the interchange approach on top of the |
thanks @nicolaskruchten - the interchange protocol isn't necessarily zero-copy (it just tries to be when possible), so it may still be more efficient to write library-agnostic code I'll try it out in plotly and see how far it goes |
👍 probably the biggest win right now would be to |
@@ -17,6 +16,7 @@ This project adheres to [Semantic Versioning](http://semver.org/). | |||
this feature was anonymously sponsored: thank you to our sponsor! | |||
- Add `legend.xref` and `legend.yref` to enable container-referenced positioning of legends [[#6589](https://github.com/plotly/plotly.js/pull/6589)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development. | |||
- Add `colorbar.xref` and `colorbar.yref` to enable container-referenced positioning of colorbars [[#6593](https://github.com/plotly/plotly.js/pull/6593)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development. | |||
- `px` methods now accept data-frame-like objects that support a `to_pandas()` method, such as polars, cudf, vaex etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolaskruchten Looks like vaex
doesn't have to_pandas
method, only to_pandas_df
.
Therefore, tests with vaex do not pass in #4244.
No description provided.