-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support for pandas Extension Arrays #5287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I remember reading somewhere that we want to keep being compatible with Edit: in any case, I think something like that would be really useful |
If there were sufficient demand and development effort for pandas extension arrays, I think there's be interest in adding it without waiting for numpy, similar to how we handle dask / sparse arrays. But I imagine it would be a decently sized project, and AFAIK no one from the existing core dev team has expressed interest in taking it on, so it would have to come from others. And it's probably a convex project that's only useful once it's completed — rather than marginally helpful with marginal improvements. |
If they added NEP-18 support, many things would work automatically, wouldn't it? Unfortunately, pandas-dev/pandas#35032 was closed. |
In my opinion, NEP-18 support is probably out of scope for pandas. But this would totally make sense for a separate mini-project, to make a NumPy compatible wrapper of pandas extension arrays. I see two possible levels of support here:
|
I'm hoping to re-open at some point. The trouble I ran into is that a) there isn't any way to implement Keep in mind that PR implemented
ATM NDArrayBackedExtensionArray explicitly supports 2D, and because it is a thin wrapper around np.ndarray higher-dimensions should either work or be within spitting distance of working. I'm trying to get support for 2D more generally (xref pandas-dev/pandas#38992), but at best it will be a while before that becomes a reality. |
Sorry for the necrobump (let me know if I should comment elsewhere), but should the target here now be "some level of support for the array-api"? |
Yes! |
ExtensionArrays are orthogonal to the array-api |
Is your feature request related to a problem? Please describe.
I started writing an ExtensionArray which is basically a
Tuple[Array[str], Array[int], Array[int], Array[str], Array[str]]
.Its scalar type is a
Tuple[str, int, int, str, str]
.This is working great in Pandas, I can read and write Parquet as well as csv with it.
However, as soon as I'm using any
.to_xarray()
method, it gets converted to a NumPy array of objects.Also, converting back to Pandas keeps a Series of objects instead of my extension type.
Describe the solution you'd like
Would it be possible to support Pandas Extension Types on coordinates?
It's not necessary to compute anything on them, I'd just like to use them for dimensions.
Describe alternatives you've considered
I was thinking over implementing a NumPy duck array, but I have never tried this and it looks quite complicated compared to the Pandas Extension types.
The text was updated successfully, but these errors were encountered: