Skip to content

Commit f261693

Browse files
committed
Handle scale_factor and add_offset as scalar
The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.
1 parent 45aab42 commit f261693

File tree

3 files changed

+29
-0
lines changed

3 files changed

+29
-0
lines changed

doc/io.rst

+6
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,12 @@ Dataset and DataArray objects, and no array values are loaded into memory until
105105
you try to perform some sort of actual computation. For an example of how these
106106
lazy arrays work, see the OPeNDAP section below.
107107

108+
There may be minor differences in the :py:class:`Dataset` object returned
109+
when reading a NetCDF file with different engines. For example,
110+
single-valued attributes are returned as scalars by the default
111+
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
112+
``engine=h5netcdf``.
113+
108114
It is important to note that when you modify values of a Dataset, even one
109115
linked to files on disk, only the in-memory copy you are manipulating in xarray
110116
is modified: the original file on disk is never touched.

xarray/coding/variables.py

+4
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,10 @@ def decode(self, variable, name=None):
269269
scale_factor = pop_to(attrs, encoding, "scale_factor", name=name)
270270
add_offset = pop_to(attrs, encoding, "add_offset", name=name)
271271
dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs)
272+
if np.ndim(scale_factor) > 0:
273+
scale_factor = scale_factor.item()
274+
if np.ndim(add_offset) > 0:
275+
add_offset = add_offset.item()
272276
transform = partial(
273277
_scale_offset_decoding,
274278
scale_factor=scale_factor,

xarray/tests/test_backends.py

+19
Original file line numberDiff line numberDiff line change
@@ -4668,3 +4668,22 @@ def test_extract_zarr_variable_encoding():
46684668
actual = backends.zarr.extract_zarr_variable_encoding(
46694669
var, raise_on_invalid=True
46704670
)
4671+
4672+
4673+
@requires_h5netcdf
4674+
def test_load_single_value_h5netcdf(tmp_path):
4675+
"""Test that numeric single-element vector attributes are handled fine.
4676+
4677+
At present (h5netcdf v0.8.1), the h5netcdf exposes single-valued numeric variable
4678+
attributes as arrays of length 1, as oppesed to scalars for the NetCDF4
4679+
backend. This was leading to a ValueError upon loading a single value from
4680+
a file, see #4471. Test that loading causes no failure.
4681+
"""
4682+
ds = xr.Dataset(
4683+
{"test": xr.DataArray(
4684+
np.array([0]),
4685+
dims=("x",),
4686+
attrs={"scale_factor": 1, "add_offset": 0})})
4687+
ds.to_netcdf(tmp_path / "test.nc")
4688+
with xr.open_dataset(tmp_path / "test.nc", engine="h5netcdf") as ds2:
4689+
ds2["test"][0].load()

0 commit comments

Comments
 (0)