-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Combine by coords dataarray bugfix #5834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6085260
4d9e781
8d81ad3
5f2263f
9056dbc
a5a9451
7eda6a4
7aa7a80
7805bb4
5a34806
61c6021
35a65cd
75c4a91
e3b71f4
38f2052
d815cf8
40c6b77
08f15af
65878e4
f475f44
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -673,7 +673,7 @@ def combine_by_coords( | |
Attempt to auto-magically combine the given datasets (or data arrays) | ||
into one by using dimension coordinates. | ||
|
||
This method attempts to combine a group of datasets along any number of | ||
This function attempts to combine a group of datasets along any number of | ||
dimensions into a single entity by inspecting coords and metadata and using | ||
a combination of concat and merge. | ||
|
||
|
@@ -765,6 +765,8 @@ def combine_by_coords( | |
Returns | ||
------- | ||
combined : xarray.Dataset or xarray.DataArray | ||
Will return a Dataset unless all the inputs are unnamed DataArrays, in which case a | ||
DataArray will be returned. | ||
|
||
See also | ||
-------- | ||
|
@@ -870,6 +872,50 @@ def combine_by_coords( | |
Data variables: | ||
temperature (y, x) float64 10.98 14.3 12.06 nan ... 18.89 10.44 8.293 | ||
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176 | ||
|
||
You can also combine DataArray objects, but the behaviour will differ depending on | ||
whether or not the DataArrays are named. If all DataArrays are named then they will | ||
be promoted to Datasets before combining, and then the resultant Dataset will be | ||
returned, e.g. | ||
|
||
>>> named_da1 = xr.DataArray( | ||
... name="a", data=[1.0, 2.0], coords={"x": [0, 1]}, dims="x" | ||
... ) | ||
>>> named_da1 | ||
<xarray.DataArray 'a' (x: 2)> | ||
array([1., 2.]) | ||
Coordinates: | ||
* x (x) int64 0 1 | ||
|
||
>>> named_da2 = xr.DataArray( | ||
... name="a", data=[3.0, 4.0], coords={"x": [2, 3]}, dims="x" | ||
... ) | ||
>>> named_da2 | ||
<xarray.DataArray 'a' (x: 2)> | ||
array([3., 4.]) | ||
Coordinates: | ||
* x (x) int64 2 3 | ||
|
||
>>> xr.combine_by_coords([named_da1, named_da2]) | ||
<xarray.Dataset> | ||
Dimensions: (x: 4) | ||
Coordinates: | ||
* x (x) int64 0 1 2 3 | ||
Data variables: | ||
a (x) float64 1.0 2.0 3.0 4.0 | ||
|
||
If all the DataArrays are unnamed, a single DataArray will be returned, e.g. | ||
|
||
>>> unnamed_da1 = xr.DataArray(data=[1.0, 2.0], coords={"x": [0, 1]}, dims="x") | ||
>>> unnamed_da2 = xr.DataArray(data=[3.0, 4.0], coords={"x": [2, 3]}, dims="x") | ||
>>> xr.combine_by_coords([unnamed_da1, unnamed_da2]) | ||
<xarray.DataArray (x: 4)> | ||
array([1., 2., 3., 4.]) | ||
Coordinates: | ||
* x (x) int64 0 1 2 3 | ||
|
||
Finally, if you attempt to combine a mix of unnamed DataArrays with either named | ||
DataArrays or Datasets, a ValueError will be raised (as this is an ambiguous operation). | ||
""" | ||
|
||
# TODO remove after version 0.21, see PR4696 | ||
|
@@ -883,33 +929,41 @@ def combine_by_coords( | |
if not data_objects: | ||
return Dataset() | ||
|
||
mixed_arrays_and_datasets = any( | ||
objs_are_unnamed_dataarrays = [ | ||
isinstance(data_object, DataArray) and data_object.name is None | ||
for data_object in data_objects | ||
) and any(isinstance(data_object, Dataset) for data_object in data_objects) | ||
if mixed_arrays_and_datasets: | ||
raise ValueError("Can't automatically combine datasets with unnamed arrays.") | ||
|
||
all_unnamed_data_arrays = all( | ||
isinstance(data_object, DataArray) and data_object.name is None | ||
for data_object in data_objects | ||
) | ||
if all_unnamed_data_arrays: | ||
unnamed_arrays = data_objects | ||
temp_datasets = [data_array._to_temp_dataset() for data_array in unnamed_arrays] | ||
|
||
combined_temp_dataset = _combine_single_variable_hypercube( | ||
temp_datasets, | ||
fill_value=fill_value, | ||
data_vars=data_vars, | ||
coords=coords, | ||
compat=compat, | ||
join=join, | ||
combine_attrs=combine_attrs, | ||
) | ||
return DataArray()._from_temp_dataset(combined_temp_dataset) | ||
|
||
] | ||
if any(objs_are_unnamed_dataarrays): | ||
if all(objs_are_unnamed_dataarrays): | ||
# Combine into a single larger DataArray | ||
temp_datasets = [ | ||
unnamed_dataarray._to_temp_dataset() | ||
for unnamed_dataarray in data_objects | ||
] | ||
|
||
combined_temp_dataset = _combine_single_variable_hypercube( | ||
temp_datasets, | ||
fill_value=fill_value, | ||
data_vars=data_vars, | ||
coords=coords, | ||
compat=compat, | ||
join=join, | ||
combine_attrs=combine_attrs, | ||
) | ||
return DataArray()._from_temp_dataset(combined_temp_dataset) | ||
else: | ||
# Must be a mix of unnamed dataarrays with either named dataarrays or with datasets | ||
# Can't combine these as we wouldn't know whether to merge or concatenate the arrays | ||
raise ValueError( | ||
"Can't automatically combine unnamed DataArrays with either named DataArrays or Datasets." | ||
) | ||
else: | ||
# Promote any named DataArrays to single-variable Datasets to simplify combining | ||
data_objects = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if the input is all named DataArrays with the same name, the output is a Dataset? Or is that handled somplace else? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't think of that case, but yes the output is a Dataset. I think that makes sense, because it still matches what the docstring says about the return type, and it also matches the results of Docstring says:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add some examples in the docstring to illustrate this? |
||
obj.to_dataset() if isinstance(obj, DataArray) else obj | ||
for obj in data_objects | ||
] | ||
|
||
# Group by data vars | ||
sorted_datasets = sorted(data_objects, key=vars_as_keys) | ||
grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's complaining about the missing newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That fixed it!