Skip to content

Fixes dimension order in xarray.Dataset.to_stacked_array #10205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 8, 2025

Conversation

aFarchi
Copy link
Contributor

@aFarchi aFarchi commented Apr 7, 2025

  • Tests added
  • User visible changes are documented in whats-new.rst

xarray.Dataset.to_stacked_array now uses dimensions in order of appearance.
This fixes the issue where using xarray.Dataset.transpose before xarray.Dataset.to_stacked_array had no effect.
(Mentioned in #9921)

Copy link

welcome bot commented Apr 7, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@max-sixty
Copy link
Collaborator

this looks great @aFarchi , thank you!

I don't have that much context, but merging given it looks like a nice improvement

@max-sixty max-sixty merged commit eb2ff69 into pydata:main Apr 8, 2025
31 checks passed
Copy link

welcome bot commented Apr 8, 2025

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

stacking_dims = tuple(dim for dim in self.dims if dim not in sample_dims)
# add stacking dims by order of appearance
stacking_dims_list: list[Hashable] = []
for da in self.data_vars.values():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review @aFarchi . We will need to iterate over self.coords too. We can have coords variables with dimensions not present on any data_var

@@ -5246,7 +5246,13 @@ def to_stacked_array(
"""
from xarray.structure.concat import concat

stacking_dims = tuple(dim for dim in self.dims if dim not in sample_dims)
# add stacking dims by order of appearance
stacking_dims_list: list[Hashable] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: This could also be a xarray.core.utils.OrderedSet

@max-sixty
Copy link
Collaborator

sorry for merging too early!

@aFarchi let us know if you can take a look at those; otherwise I can revert or look at them myself

@aFarchi
Copy link
Contributor Author

aFarchi commented Apr 9, 2025

thanks for the feedback. Good catch! I hadn't thought about the case where coordinates have dimensions that are not on any variables.
I will update my branch later this week to take that into account.

@aFarchi
Copy link
Contributor Author

aFarchi commented Apr 11, 2025

Coming back to this issue, I had a quick look at the "old" behaviour of to_stacked_array in the case where we have coordinates with dimensions that are not in any variables. For example:

ds = xr.Dataset(
    data_vars=dict(
        v1=(['d1'], np.arange(4)),
    ),
    coords=dict(
        c1=(['d1', 'd2'], np.arange(8).reshape((4, 2))),
    ),
)
print(ds.to_stacked_array(new_dim='new_dim', sample_dims=[], variable_dim='var_dim'))

which currently returns:

<xarray.DataArray 'v1' (new_dim: 4)> Size: 32B
array([0, 1, 2, 3])
Coordinates:
  * new_dim  (new_dim) object 32B MultiIndex
  * var_dim  (new_dim) <U2 32B 'v1' 'v1' 'v1' 'v1'
  * d1       (new_dim) int64 32B 0 1 2 3
  * d2       (new_dim) object 32B nan nan nan nan

As far as I understand, if we have a dimension which is in a coordinate but in no variables, this dimension will necessarily be filled with nans in the stacked array. But I don't think that it is necessary to include an array of nans, is it? Or am I missing something here? @dcherian do you see any cases where it would be useful to keep that extra dimension?

Mikejmnez pushed a commit to Mikejmnez/xarray that referenced this pull request Apr 12, 2025
)

* Fixes dimension order in xarray.Dataset.to_stacked_array

* corrected dummy variable name to satisfy mypy

* added type annotation to satisfy mypy

* corrected type annotation to satisfy mypy
dcherian added a commit that referenced this pull request Apr 27, 2025
* main: (76 commits)
  Update how-to-add-new-backend.rst (#10240)
  Support extension array indexes (#9671)
  Switch documentation to pydata-sphinx-theme (#8708)
  Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239)
  Fix mypy, min-versions CI, xfail Zarr tests (#10255)
  Remove `test_dask_layers_and_dependencies` (#10242)
  Fix: Docs generation create temporary files that are not cleaned up. (#10238)
  opendap / dap4 support for pydap backend (#10182)
  Add RangeIndex (#10076)
  Fix mypy (#10232)
  Fix doctests (#10230)
  Fix broken Sphinx Roles (#10225)
  `DatasetView.map` fix `keep_attrs` (#10219)
  Add datatree repr asv (#10214)
  CI: Automatic PR labelling is back (#10201)
  Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205)
  Fix references to core classes in docs (#10207)
  Update pre-commit hooks (#10208)
  add `scipy-stubs` as extra `[types]` dependency (#10202)
  Fix sparse dask repr test (#10200)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants