Skip to content

preserve chunked data when creating DataArray from itself #5983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FabianHofmann opened this issue Nov 13, 2021 · 4 comments · Fixed by #5984
Closed

preserve chunked data when creating DataArray from itself #5983

FabianHofmann opened this issue Nov 13, 2021 · 4 comments · Fixed by #5984

Comments

@FabianHofmann
Copy link
Contributor

What happened:

When creating a new DataArray from a DataArray with chunked data, the underlying dask array is converted to a numpy array.

What you expected to happen:

I expected the underlying dask array to be preseved when creating a new DataArray instance.

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np
from dask import array

d = np.ones((10, 10))
x = array.from_array(d, chunks=5)

da = xr.DataArray(x) # this is chunked
xr.DataArray(da) # this is not chunked anymore

Anything else we need to know?:

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.11.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4

xarray: 0.19.0
pandas: 1.3.3
numpy: 1.20.3
scipy: 1.7.1
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.10.1
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.6
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.09.1
distributed: 2021.09.1
matplotlib: 3.4.3
cartopy: 0.19.0.post1
seaborn: 0.11.2
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: 4.10.3
pytest: 6.2.5
IPython: 7.27.0
sphinx: 4.2.0

@dcherian
Copy link
Contributor

Can you give us a little more context about why this might be useful? IIRC we disallowed creating dataarrays from dataarrays in some other place because it leads to ambiguous situations like the following

xr.DataArray(da, attrs={"a": 1})  # does the result have da.attrs or the provided attrs?

@FabianHofmann
Copy link
Contributor Author

Ah yes, this is indeed ambiguous. On the other hand, as long it is still supported to create DataArray's from DataArray's they should at least preserve the data format. I need this as I am creating a subclass from the xarray.DataArray (see https://github.com/PyPSA/linopy/blob/8ac34d9fdbddc1fec0c7b4781f3d49e9c5ae064e/linopy/constraints.py#L18). In case I want to convert a lazy DataArray to my custom class the chunked data is directly computed, which seems a bit weird...

@dcherian
Copy link
Contributor

IMO we should raise an error asking the user to pass da.data instead

@FabianHofmann
Copy link
Contributor Author

Not sure, but I'd argue to keep the DataArray-from-self-construction as I imagine many convenience cases where arrays maybe DataArray, numpy arrays or dask arrays, and one wants to ensure a DataArray type. Many other packages like pandas/numpy have that. Also the xarray.Dataset supports from-self-construction.

Perhaps it is better to raise an error when ambiguities occur? Meaning don't allowing to pass attrs, coords when data is an DataArray...

jjpr-mit added a commit to brain-score/brainio that referenced this issue Oct 24, 2022
…0. 0.21.0 drops support for python 3.7. 0.20.2 has the bug. This commit tests if 0.20.1 does, too.

issue:  [preserve chunked data when creating DataArray from itself #5983](pydata/xarray#5983)
corresponding pull request:  [preserve chunked data when creating DataArray from DataArray #5984](pydata/xarray#5984)
released in 0.21.0:  [https://docs.xarray.dev/en/stable/whats-new.html#id81]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants