Skip to content

Regression on DataArray.unstack on v2022.06.0 : "ValueError: IndexVariable objects must be 1-dimensional" #6969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
bboutanquoi opened this issue Aug 30, 2022 · 1 comment · Fixed by #6992
Closed
4 tasks done
Assignees

Comments

@bboutanquoi
Copy link

bboutanquoi commented Aug 30, 2022

What happened?

Please see code below

With xarray:2022.06.0, DataArray.unstack raise an ValueError exception

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 24>()
     21 y = y.assign_coords(day=y.j + y.last_j)
     22 y = y.set_index(multi=['sub_id', 'last_j'])
---> 24 y = y.unstack()

File /opt/conda/lib/python3.9/site-packages/xarray/core/dataarray.py:2402, in DataArray.unstack(self, dim, fill_value, sparse)
   2342 def unstack(
   2343     self,
   2344     dim: Hashable | Sequence[Hashable] | None = None,
   2345     fill_value: Any = dtypes.NA,
   2346     sparse: bool = False,
   2347 ) -> DataArray:
   2348     """
   2349     Unstack existing dimensions corresponding to MultiIndexes into
   2350     multiple new dimensions.
   (...)
   2400     DataArray.stack
   2401     """
-> 2402     ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
   2403     return self._from_temp_dataset(ds)

File /opt/conda/lib/python3.9/site-packages/xarray/core/dataset.py:4656, in Dataset.unstack(self, dim, fill_value, sparse)
   4652         result = result._unstack_full_reindex(
   4653             dim, stacked_indexes[dim], fill_value, sparse
   4654         )
   4655     else:
-> 4656         result = result._unstack_once(
   4657             dim, stacked_indexes[dim], fill_value, sparse
   4658         )
   4659 return result

File /opt/conda/lib/python3.9/site-packages/xarray/core/dataset.py:4492, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse)
   4489     else:
   4490         fill_value_ = fill_value
-> 4492     variables[name] = var._unstack_once(
   4493         index=clean_index,
   4494         dim=dim,
   4495         fill_value=fill_value_,
   4496         sparse=sparse,
   4497     )
   4498 else:
   4499     variables[name] = var

File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:1732, in Variable._unstack_once(self, index, dim, fill_value, sparse)
   1727     # Indexer is a list of lists of locations. Each list is the locations
   1728     # on the new dimension. This is robust to the data being sparse; in that
   1729     # case the destinations will be NaN / zero.
   1730     data[(..., *indexer)] = reordered
-> 1732 return self._replace(dims=new_dims, data=data)

File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:985, in Variable._replace(self, dims, data, attrs, encoding)
    983 if encoding is _default:
    984     encoding = copy.copy(self._encoding)
--> 985 return type(self)(dims, data, attrs, encoding, fastpath=True)

File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:2720, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2718 super().__init__(dims, data, attrs, encoding, fastpath)
   2719 if self.ndim != 1:
-> 2720     raise ValueError(f"{type(self).__name__} objects must be 1-dimensional")
   2722 # Unlike in Variable, always eagerly load values into memory
   2723 if not isinstance(self._data, PandasIndexingAdapter):

ValueError: IndexVariable objects must be 1-dimensional

What did you expect to happen?

Please see code below

With xarray:2022.03.0, code runs well

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

x = np.concatenate((np.repeat(np.nan,4), np.repeat(1,2))).reshape(3, 2).transpose()
x = xr.DataArray(
    x,
    coords = {
        'composite_id': ['s00', 's10'],
        'sub_id': ('composite_id', ['0', '1']),
        'last_j': ('composite_id', [100, 111]),
        'j': [-2,-1,0]
    },
    dims= ['composite_id', 'j']
)

y = x
y = y.stack({'multi': ['composite_id', 'j']})
y = y.dropna('multi')
y = y.assign_coords(day=y.j + y.last_j)
y = y.set_index(multi=['sub_id', 'last_j'])

y = y.unstack()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Not working environment with xarray 2022.06.0

INSTALLED VERSIONS

commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:51:20)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.104-linuxkit
machine: aarch64
processor: aarch64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 62.1.0
pip: 22.0.4
conda: 4.12.0
pytest: None
IPython: 8.3.0
sphinx: None
/opt/conda/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

Working environment with xarray 2022.03.0

INSTALLED VERSIONS

commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:51:20)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.104-linuxkit
machine: aarch64
processor: aarch64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.3.0
pandas: 1.4.3
numpy: 1.23.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: 4.12.0
pytest: None
IPython: 8.3.0
sphinx: None
/opt/conda/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

@bboutanquoi bboutanquoi added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 30, 2022
@benbovy
Copy link
Member

benbovy commented Aug 31, 2022

Thanks for the report @bboutanquoi.

The issue seems to be related to set_index(), which in your example does not convert back the old multi-index coordinates back to regular variables:

y = y.set_index(multi=['sub_id', 'last_j'])
type(y.composite_id.variable)
# IndexVariable (should be Variable)

I guess it's similar to #6946 which is related to reset_index.

@benbovy benbovy self-assigned this Aug 31, 2022
@benbovy benbovy added topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 31, 2022
@benbovy benbovy mentioned this issue Sep 5, 2022
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants