-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
open_dataset
with chunks="auto"
fails when a netCDF4 variables/coordinates is encoded as NC_STRING
#7868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @kmuehlbauer ! #7869 solve the issues ! Summarizing:
Thanks again @kmuehlbauer for having resolved the problem in less than 2 hours 🥇 |
@ghiggi Glad it works, but we still have to check if that is the correct location for the fix, as it's not CF specific. |
My main question here is, why is dask not trying to retrieve the object types from dtype.metadata? Or does it and fail for some reason?. |
Dask array with dtype With your PR, the dtype is not anymore |
UPDATE NOTE: |
What is your issue?
I noticed that
open_dataset
withchunks="auto"
fails when netCDF4 variables/coordinates are encoded asNC_STRING
.The reason is that xarray reads netCDF4
NC_STRING
asobject
type, anddask
cannot estimate the size of aobject
dtype.As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s)
encoding
(s) as a fixed-length string type (i.e"S2"
if max string length is 2) so that the data are written asNC_CHAR
and xarray read it back as byte-encoded fixed-length string type.Here below I provide a reproducible example
Questions:
open_dataset
should not take care of automatically deserializing theNC_CHAR
fixed-length byte-string representation into aUnicode string
?open_dataset
should not take care of automatically readingNC_STRING
asUnicode string
(convertingobject
tostr
)?Related issues are:
The text was updated successfully, but these errors were encountered: