Skip to content

Plotly Express column input formats #1767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nicolaskruchten opened this issue Sep 11, 2019 · 10 comments · Fixed by #1768
Closed

Plotly Express column input formats #1767

nicolaskruchten opened this issue Sep 11, 2019 · 10 comments · Fixed by #1768
Assignees

Comments

@nicolaskruchten
Copy link
Contributor

Right now for all px methods, data_frame is a required argument and the values of many kwargs are column names as strings.

We should be much more flexible in what we accept, for example the following should work:

px.scatter(x=[1,2,3], y=[1,2,3])

In which case we would internally build a data frame with column names "x" and "y".

In addition, the following should work:

px.scatter(df, x=df.index, y=df.col)

In which case we can read the column names (and optionally index name?) from the data frame.

Finally, we should allow for mixing and matching:

px.scatter(df, x=df.index, y=df.col, color=[1,2,3])
@nicolaskruchten
Copy link
Contributor Author

A couple of additional thoughts... The following should work, and result in a data frame internally with column names "x" and "y" because the parent data frame was not provided to px:

px.scatter(x=df.x_col, y=df.y_col)

Finally, just to be explicit, we should accept the same formats wherever we accept lists of column names, like in dimensions, hover_data and custom_data.

@nicolaskruchten
Copy link
Contributor Author

This was a PR which implemented part of these requirements earlier this summer, but was held up by the then-impending repo shuffle: plotly/plotly_express#87

@emmanuelle
Copy link
Contributor

Also take a look at xarray

@nicolaskruchten
Copy link
Contributor Author

Re xarray I seem to recall reading that 1-dimensional arrays could have an "onboard" name, unlike pandas.Series objects, whose names are only available in a containing pandas.DataFrame... if this name is easy to read, it might be nice to use it instead of "x" in the case of px.scatter(x=<xarray vector with onboard name>) :)

@emmanuelle
Copy link
Contributor

We could also accept px.scatter(x={'time':[1,2,3]}, y={'revenue':[1,2,3]}). What do you think?

@emmanuelle
Copy link
Contributor

It would be a way to give names to columns easily. We can also use the labels argument for this.

@nicolaskruchten
Copy link
Contributor Author

Yes, I think that px.scatter(x=[1,2,3], y=[1,2,3], labels={'x': 'time', 'y': 'money'}) would be the way here.

@nicolaskruchten
Copy link
Contributor Author

depending on how it's implemented, the above should "just work" actually :)

@emmanuelle
Copy link
Contributor

and what do you think about px.scatter(x={'time':[1,2,3]}, y={'money':[1,2,3]})? Should we allow it as well?

@nicolaskruchten
Copy link
Contributor Author

I'm less of a fan of this, partly because it's not a previously-existing convention, as far as I know, and because it shuts the door to a future type-based dispatching on x, y etc (i.e. we might find some other future use for this pattern other than names...). Also, by the time you're creating dicts, you may as well create a data frame with pd.DataFrame({'time':[1,2,3], 'money':[1,2,3]})...

Maybe we should accept {'time':[1,2,3], 'money':[1,2,3]} as the data_frame argument, by just passing whatever we get in there into the pd.DataFrame constructor, so that you could do px.scatter({'time':[1,2,3], 'money': [1,2,3]}, x='time', y='money') ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants