Skip to content

Add HDF5 Section to read/write docs page #8012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 24, 2023
61 changes: 61 additions & 0 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,67 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
Note that this produces a file that is likely to be not readable by other netCDF
libraries!

.. _io.hdf5:

HDF5
----
`HDF5`_ is both a file format and a data model for storing information. HDF5 stores
data hierarchically, using groups to create a nested structure. HDF5 is a more
general verion of the netCDF4 data model, so the nested structure is one of many
similarities between the two data formats.

Reading HDF5 files in xarray requires the ``h5netcdf`` engine, which can be installed
with ``conda install h5netcdf``. Once installed we can use xarray to open HDF5 files:

.. code:: python

xr.open_dataset("/path/to/my/file.h5")

The similarities between HDF5 and netCDF4 mean that HDF5 data can be written with the
same :py:meth:`Dataset.to_netcdf` method as used for netCDF4 data:

.. ipython:: python

ds = xr.Dataset(
{"foo": (("x", "y"), np.random.rand(4, 5))},
coords={
"x": [10, 20, 30, 40],
"y": pd.date_range("2000-01-01", periods=5),
"z": ("x", list("abcd")),
},
)

ds.to_netcdf("saved_on_disk.h5")

Groups
~~~~~~

If you have multiple or highly nested groups, xarray by default may not read the group
that you want. A particular group of an HDF5 file can be specified using the ``group``
argument:

.. code:: python

xr.open_dataset("/path/to/my/file.h5", group="/my/group")

While xarray cannot interrogate an HDF5 file to determine which groups are available,
the HDF5 Python reader `h5py`_ can be used instead.

Natively the xarray data structures can only handle one level of nesting, organized as
DataArrays inside of Datasets. If your HDF5 file has additional levels of hierarchy you
can only access one group and a time and will need to specify group names.

.. note::

For native handling of multiple HDF5 groups with xarray, including I/O, you might be
interested in the experimental
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package.


.. _HDF5: https://hdfgroup.github.io/hdf5/index.html
.. _h5py: https://www.h5py.org/


.. _io.zarr:

Zarr
Expand Down