Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow writing series to parquet file #54638

Open
1 of 3 tasks
lcrmorin opened this issue Aug 19, 2023 · 7 comments
Open
1 of 3 tasks

ENH: allow writing series to parquet file #54638

lcrmorin opened this issue Aug 19, 2023 · 7 comments
Assignees
Labels
Enhancement IO Parquet parquet, feather Needs Discussion Requires discussion from core team before further action Series Series data structure

Comments

@lcrmorin
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently the .to_parquet() method only work for dataframe. It would be nicer if the method could work on Series to. Currently we either have to save the series to another format or involve a pd.DataFrame(Serie) which seems a bit clunky.

Feature Description

For a given pandas Serie, being able to write Serie.to_parquet()

Alternative Solutions

Currently the two alternatives are:

  • save to another format which is a bit convoluted as we now have to deal with multiple formats.
  • convert the series to a DataFrame to use the DF method.

Additional Context

No response

@lcrmorin lcrmorin added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2023
@rhshadrach
Copy link
Member

rhshadrach commented Aug 19, 2023

If we added a Series.to_parquet, I think users would expect to be able to round trip back to Series. I'm not sure but I don't think that's possible.

I personally use ser.to_frame(name).to_parquet(...).

cc @jorisvandenbossche

@rhshadrach rhshadrach added IO Parquet parquet, feather Series Series data structure labels Aug 19, 2023
@jorisvandenbossche
Copy link
Member

If we added a Series.to_parquet, I think users would expect to be able to round trip back to Series.

We have other IO methods on Series that doesn't necessarily give you that guarantee. For example, when reading the result of Series.to_csv with pd.read_csv, you will also get a DataFrame, I think.

So from that point of view, I would personally be fine with such a non-perfect roundtripping behaviour for Series.to_parquet as well.

The question is if we want to add all of our IO methods to Series as well in general, or not (given that the workaround is quite easy). It seems we are now a bit inconsistent.

@sammcbeth
Copy link

take

@sammcbeth
Copy link

Assigning this to myself as it seems like a good first issue for me given I use pandas with parquet files regularly. Seems like there's still some ongoing discussion around the appropriateness of this so Ill keep an eye out if people decide this is no longer needed

@sammcbeth
Copy link

@jorisvandenbossche this will need much more testing but I got it working locally and I wanted to get some initial validation on the idea https://github.com/pandas-dev/pandas/pull/54675/files

Alternatively we could do what Series.to_markdown() does here and simply cast the series to a frame and use the frames methods. I figured this wasn't as clean / easy to write unit tests for. Let me know if I have the right idea above whenever you have a chance. Thanks!

@rhshadrach
Copy link
Member

We have other IO methods on Series that doesn't necessarily give you that guarantee. For example, when reading the result of Series.to_csv with pd.read_csv, you will also get a DataFrame, I think.

So from that point of view, I would personally be fine with such a non-perfect roundtripping behaviour for Series.to_parquet as well.

I expect a lot more out of parquet than I do CSV/JSON/Excel, in particular round tripping with dtypes. I'm not so convinced that a comparison to CSV is warranted.

Do all IO methods rountrip back as a DataFrame? If that's the case, then I don't think it's worth the maintenance burden to have these methods on Series when they are just a .to_frame() call away. But if there is good reason to keep some of them, then I can see the value that having them all on Series bring for a consistent API.

@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
@jpkemp
Copy link

jpkemp commented Apr 7, 2025

I just found this issue as a result of hitting the inconsistent API problem, and my take is that consistent API is better because if I'm trying to save data then I want the data to save. It's easier to change types with data that exists...

Not a huge problem in this instance, but I wasn't expecting it, especially as some options are consistent - either none or all would be good, but I'd prefer all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Parquet parquet, feather Needs Discussion Requires discussion from core team before further action Series Series data structure
Projects
None yet
Development

No branches or pull requests

6 participants