Fixes dimension order in `xarray.Dataset.to_stacked_array` #10205

aFarchi · 2025-04-07T07:32:28Z

Tests added
User visible changes are documented in whats-new.rst

xarray.Dataset.to_stacked_array now uses dimensions in order of appearance.
This fixes the issue where using xarray.Dataset.transpose before xarray.Dataset.to_stacked_array had no effect.
(Mentioned in #9921)

welcome · 2025-04-07T07:32:30Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

max-sixty · 2025-04-08T15:05:05Z

this looks great @aFarchi , thank you!

I don't have that much context, but merging given it looks like a nice improvement

welcome · 2025-04-08T15:05:13Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

dcherian · 2025-04-08T15:31:18Z

xarray/core/dataset.py

-        stacking_dims = tuple(dim for dim in self.dims if dim not in sample_dims)
+        # add stacking dims by order of appearance
+        stacking_dims_list: list[Hashable] = []
+        for da in self.data_vars.values():


Sorry for the late review @aFarchi . We will need to iterate over self.coords too. We can have coords variables with dimensions not present on any data_var

dcherian · 2025-04-08T15:31:44Z

xarray/core/dataset.py

@@ -5246,7 +5246,13 @@ def to_stacked_array(
        """
        from xarray.structure.concat import concat

-        stacking_dims = tuple(dim for dim in self.dims if dim not in sample_dims)
+        # add stacking dims by order of appearance
+        stacking_dims_list: list[Hashable] = []


minor comment: This could also be a xarray.core.utils.OrderedSet

max-sixty · 2025-04-08T15:34:04Z

sorry for merging too early!

@aFarchi let us know if you can take a look at those; otherwise I can revert or look at them myself

aFarchi · 2025-04-09T06:38:36Z

thanks for the feedback. Good catch! I hadn't thought about the case where coordinates have dimensions that are not on any variables.
I will update my branch later this week to take that into account.

aFarchi · 2025-04-11T12:58:16Z

Coming back to this issue, I had a quick look at the "old" behaviour of to_stacked_array in the case where we have coordinates with dimensions that are not in any variables. For example:

ds = xr.Dataset(
    data_vars=dict(
        v1=(['d1'], np.arange(4)),
    ),
    coords=dict(
        c1=(['d1', 'd2'], np.arange(8).reshape((4, 2))),
    ),
)
print(ds.to_stacked_array(new_dim='new_dim', sample_dims=[], variable_dim='var_dim'))

which currently returns:

<xarray.DataArray 'v1' (new_dim: 4)> Size: 32B
array([0, 1, 2, 3])
Coordinates:
  * new_dim  (new_dim) object 32B MultiIndex
  * var_dim  (new_dim) <U2 32B 'v1' 'v1' 'v1' 'v1'
  * d1       (new_dim) int64 32B 0 1 2 3
  * d2       (new_dim) object 32B nan nan nan nan

As far as I understand, if we have a dimension which is in a coordinate but in no variables, this dimension will necessarily be filled with nans in the stacked array. But I don't think that it is necessary to include an array of nans, is it? Or am I missing something here? @dcherian do you see any cases where it would be useful to keep that extra dimension?

) * Fixes dimension order in xarray.Dataset.to_stacked_array * corrected dummy variable name to satisfy mypy * added type annotation to satisfy mypy * corrected type annotation to satisfy mypy

* main: (76 commits) Update how-to-add-new-backend.rst (#10240) Support extension array indexes (#9671) Switch documentation to pydata-sphinx-theme (#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239) Fix mypy, min-versions CI, xfail Zarr tests (#10255) Remove `test_dask_layers_and_dependencies` (#10242) Fix: Docs generation create temporary files that are not cleaned up. (#10238) opendap / dap4 support for pydap backend (#10182) Add RangeIndex (#10076) Fix mypy (#10232) Fix doctests (#10230) Fix broken Sphinx Roles (#10225) `DatasetView.map` fix `keep_attrs` (#10219) Add datatree repr asv (#10214) CI: Automatic PR labelling is back (#10201) Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205) Fix references to core classes in docs (#10207) Update pre-commit hooks (#10208) add `scipy-stubs` as extra `[types]` dependency (#10202) Fix sparse dask repr test (#10200) ...

Fixes dimension order in xarray.Dataset.to_stacked_array

0233892

aFarchi added 4 commits April 7, 2025 09:34

Merge branch 'main' into fix-transpose-stack

5cc4c22

corrected dummy variable name to satisfy mypy

556fb8b

added type annotation to satisfy mypy

9d1d388

corrected type annotation to satisfy mypy

8f4ada4

max-sixty merged commit eb2ff69 into pydata:main Apr 8, 2025
31 checks passed

dcherian reviewed Apr 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes dimension order in `xarray.Dataset.to_stacked_array` #10205

Fixes dimension order in `xarray.Dataset.to_stacked_array` #10205

aFarchi commented Apr 7, 2025

welcome bot commented Apr 7, 2025

max-sixty commented Apr 8, 2025

welcome bot commented Apr 8, 2025

dcherian Apr 8, 2025

dcherian Apr 8, 2025

max-sixty commented Apr 8, 2025

aFarchi commented Apr 9, 2025

aFarchi commented Apr 11, 2025

Fixes dimension order in xarray.Dataset.to_stacked_array #10205

Fixes dimension order in xarray.Dataset.to_stacked_array #10205

Conversation

aFarchi commented Apr 7, 2025

welcome bot commented Apr 7, 2025

max-sixty commented Apr 8, 2025

welcome bot commented Apr 8, 2025

dcherian Apr 8, 2025

Choose a reason for hiding this comment

dcherian Apr 8, 2025

Choose a reason for hiding this comment

max-sixty commented Apr 8, 2025

aFarchi commented Apr 9, 2025

aFarchi commented Apr 11, 2025

Fixes dimension order in `xarray.Dataset.to_stacked_array` #10205

Fixes dimension order in `xarray.Dataset.to_stacked_array` #10205