Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Added docs page on io #158

Merged
merged 9 commits into from
Nov 8, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Datatree
Quick Overview <quick-overview>
Tutorial <tutorial>
Data Model <data-structures>
Reading and Writing Files <io>
API Reference <api>
How do I ... <howdoi>
Contributing Guide <contributing>
Expand Down
50 changes: 50 additions & 0 deletions docs/source/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _data structures:

Reading and Writing Files
========================

.. note::

This page builds on the information given in xarray's main page on
`reading and writing files <https://docs.xarray.dev/en/stable/user-guide/io.html>`_,
so it is suggested that you are familiar with those first.


netCDF
======

Groups
------

Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded
as a single ``:py:class::DataTree`` object.
To open a whole netCDF file as a tree of groups use the ``:py:func::open_datatree()`` function.
To save a DataTree object as a netCDF file containing many groups, use the ``:py:meth::DataTree.to_netcdf()`` method.


.. _netcdf datatree group warning

.. warning::
``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping
is not always possible.

In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them.
This is in contrast to `xarray's data model`_ (and hence `datatree's data model`_) in which the dimensions of a (Dataset/Tree)
object are simply the set of dimensions present across all variables in that dataset.
This means that if a netCDF file contains dimensions but variables which possess those dimensions,
these dimensions will not be present when that file is opened as a DataTree object.
Saving this DataTree object to file will therefore not preserve these "unused" dimensions.

Zarr
====

Groups
------

Nested groups in zarr stores can be represented by loading the store as a ``:py:class::DataTree`` object, similarly to netCDF.
To open a whole zarr store as a tree of groups use the ``:py:func::open_datatree()`` function.
To save a DataTree object as a zarr store containing many groups, use the ``:py:meth::DataTree.to_zarr()`` method.

.. note::
Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files<netcdf datatree group warning>`),
as zarr does not support "unused" dimensions.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamman can you confirm this statement about zarr is true? I couldn't really tell from looking at the v2 spec quickly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct.