-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: allow writing series to parquet file #54638
Comments
If we added a Series.to_parquet, I think users would expect to be able to round trip back to Series. I'm not sure but I don't think that's possible. I personally use |
We have other IO methods on Series that doesn't necessarily give you that guarantee. For example, when reading the result of So from that point of view, I would personally be fine with such a non-perfect roundtripping behaviour for The question is if we want to add all of our IO methods to Series as well in general, or not (given that the workaround is quite easy). It seems we are now a bit inconsistent. |
take |
Assigning this to myself as it seems like a good first issue for me given I use pandas with parquet files regularly. Seems like there's still some ongoing discussion around the appropriateness of this so Ill keep an eye out if people decide this is no longer needed |
@jorisvandenbossche this will need much more testing but I got it working locally and I wanted to get some initial validation on the idea https://github.com/pandas-dev/pandas/pull/54675/files Alternatively we could do what Series.to_markdown() does here and simply cast the series to a frame and use the frames methods. I figured this wasn't as clean / easy to write unit tests for. Let me know if I have the right idea above whenever you have a chance. Thanks! |
I expect a lot more out of parquet than I do CSV/JSON/Excel, in particular round tripping with dtypes. I'm not so convinced that a comparison to CSV is warranted. Do all IO methods rountrip back as a DataFrame? If that's the case, then I don't think it's worth the maintenance burden to have these methods on Series when they are just a |
I just found this issue as a result of hitting the inconsistent API problem, and my take is that consistent API is better because if I'm trying to save data then I want the data to save. It's easier to change types with data that exists... Not a huge problem in this instance, but I wasn't expecting it, especially as some options are consistent - either none or all would be good, but I'd prefer all. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Currently the .to_parquet() method only work for dataframe. It would be nicer if the method could work on Series to. Currently we either have to save the series to another format or involve a pd.DataFrame(Serie) which seems a bit clunky.
Feature Description
For a given pandas Serie, being able to write Serie.to_parquet()
Alternative Solutions
Currently the two alternatives are:
Additional Context
No response
The text was updated successfully, but these errors were encountered: