-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Combine_by_coords not working on named DataArrays where the data is a Dask Array. #5833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for raising this @anlavandier! And thanks especially for the clear reproducible example. You've brought up 3 specific issues, so in turn:
This one is actually not a bug in combine, it's merely a slightly unclear (but valid) error message in If you change your example script to create the result before attempting to add it to the Dataset, i.e. result = xr.combine_by_coords(dataarray_list)
print(result)
combined = xr.Dataset()
combined["test"] = result you can see that
However what is happening is that combining unnamed dataarrays returns a DataArray (rather than a Dataset). The fact that When you try to assign this object to your empty Dataset, it will happily assign a DataArray but will fail when trying to assign a Dataset as a variable of a Dataset (as it should). That's why your script behaves differently for named vs unnamed dataarrays. As an aside the error message for
This is a real bug, introduced in #4696 . It happens because the
This does happen with Thanks again for raising this! Let me know if you think #5834 hasn't fully fixed your problem :) |
Thank you for your response. I don't know if we should wait for #5834 to be merged to close the issue or do it already but as far as I'm concerned it's fixed. |
What happened:
xr.combine_by_coords
failed (only when the arrays are named)What you expected to happen:
xr.combine_by_coords
to work as intended.Minimal Complete Verifiable Example:
When
n == 1
:When
n>=2
:Anything else we need to know?:
data_i.name = None
fixes everything.xr.combine_by_coords
whenn == 2
before it fails on its own, we can see that it actually computes the dask arrays which is also a problem. Here's an example to show that.Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.6.1
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.6.2
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.04.1
distributed: 2021.04.1
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.27.0
sphinx: 4.2.0
The text was updated successfully, but these errors were encountered: