-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet #6836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@benbovy I tracked this down to >>> mda.one.to_index()
# v2022.06.0
MultiIndex([('a', 0),
('a', 1),
('b', 0),
('b', 1),
('c', 0),
('c', 1)],
names=['one', 'two'])
# v2022.03.0
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object', name='x') We call Lines 115 to 140 in f8fee90
Not sure if the fix should be only in the GroupBy specifically or more generally in The GroupBy context is Line 434 in f8fee90
|
After trying to dig down further into the code, I saw that grouping over levels seems to be broken generally (up-to-date main branch at time of writing), i.e. import pandas as pd
import numpy as np
import xarray as xr
midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("one", "two"))
mda = xr.DataArray(np.random.rand(6, 3), [("x", midx), ("y", range(3))])
mda.groupby("one").sum() raises: File ".../xarray/xarray/core/_reductions.py", line 5055, in sum
return self.reduce(
File ".../xarray/xarray/core/groupby.py", line 1191, in reduce
return self.map(reduce_array, shortcut=shortcut)
File ".../xarray/xarray/core/groupby.py", line 1095, in map
return self._combine(applied, shortcut=shortcut)
File ".../xarray/xarray/core/groupby.py", line 1127, in _combine
index, index_vars = create_default_index_implicit(coord)
File ".../xarray/xarray/core/indexes.py", line 974, in create_default_index_implicit
index = PandasMultiIndex(array, name)
File ".../xarray/xarray/core/indexes.py", line 552, in __init__
raise ValueError(
ValueError: conflicting multi-index level name 'one' with dimension 'one' in the function |
Thanks @emmaai for the issue report and thanks @dcherian and @FabianHofmann for tracking it down. There is a lot of complexity related to
We should probably avoid using |
* unpin xarray, numpy, pandas, netcdf4 * Fix deprecation "TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property." See pydata/xarray#6508. There are other errors, however. * See Unidata/netcdf4-python#1175 "Regression in createVariable between 1.5.8 and 1.6.0". * Testing on Python 3.7 only covers through xarray 0.20.2. This is an experiment. * Fix get_metadata to account for the internal restructuring of indexes in xarray 2022.06. Update tests for same. * Fixed bug where we were stripping off attrs inadvertently when writing netCDF. * TestPlainGroupby.test_on_data_array fails with Python = 3.8.13, numpy = 1.23.2, xarray = 2022.06.0. That means it has nothing to do with BrainIO. * test_on_data_array should not involve any BrainIO classes. This is to test for bugs in xarray. With xarray==2022.06.0, this test fails. * xarray 2022.06.0 has a bug which breaks BrainIO: pydata/xarray#6836. * Adapt get_metadata to the change in the index API between 2022.03.0 and 2022.06.0. Now test_get_metadata passes under 2022.03.0 and 2022.06.0. * Getting an error from tests on Travis (but not locally): RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter. This might fix it? * Compression test failed: assert 614732 > 615186. This might fix it. * Travis doesn't offer python 3.10 yet. Make sample assembly bigger so compression has an effect. * Bump minor version. Authored-by: Jonathan Prescott-Roy <[email protected]> and Martin Schrimpf <[email protected]>
it still has the bug introduced in 2022.6.0 pydata/xarray#6836
it still has the bug introduced in 2022.6.0 pydata/xarray#6836
...we cannot fix that in |
Is there hope for groupby working on multi-indexed DataArrays again in the future? We -- and from the issue history it looks like others too -- are currently pinning |
I think we could special-case extracting a multiindex level here: Line 469 in d4db166
@mschrimpf Can you try that and send in a PR? |
A special-case sounds reasonable to me as well as a temporary fix before looking into if/how we can refactor groupby so that it works with multiple kinds of built-in and/or custom indexes. |
I solved it temporarily by |
* Fix .groupby(multi index level) Closes #6836 * Update xarray/tests/test_groupby.py * mypy: Add _DummyGroup.to_index --------- Co-authored-by: Illviljan <[email protected]>
* Fix .groupby(multi index level) Closes pydata#6836 * Update xarray/tests/test_groupby.py * mypy: Add _DummyGroup.to_index --------- Co-authored-by: Illviljan <[email protected]>
This is happening again :( |
@carynbear please could you post a reproducible example? A new bug report would be ideal... |
What happened?
run the code block below with
2022.6.0
output:
What did you expect to happen?
as it was with
2022.3.0
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
N/A
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.11.0-1025-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.5.1
cartopy: 0.20.3
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 45.2.0
pip: 22.2
conda: None
pytest: 7.1.2
IPython: 7.31.0
sphinx: None
The text was updated successfully, but these errors were encountered: