-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Datatree alignment docs #9501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datatree alignment docs #9501
Changes from 19 commits
ae71437
928767a
1adb945
ae1bcfd
f025371
7549ee9
b631697
02bf96b
152d74a
57b7f06
d3ac1a7
adf7579
d73dd8a
d779e22
3c9ad55
5a4309a
4cee745
4c030d8
09385fd
35ab311
6db4a0b
22f2726
e879dbb
401c6b0
118e802
c129eb1
b245bdd
9b8fc9b
b6385ce
d2918bb
6cab6f8
af5c6b7
00105a4
d49c2de
a3d5223
44b14ef
44bcf6c
ee78160
ea99430
64bb8ba
82a70a0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
.. _hierarchical-data: | ||
|
||
Hierarchical data | ||
============================== | ||
================= | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
@@ -15,6 +15,8 @@ Hierarchical data | |
|
||
%xmode minimal | ||
|
||
.. _why: | ||
|
||
Why Hierarchical Data? | ||
---------------------- | ||
|
||
|
@@ -644,3 +646,165 @@ We could use this feature to quickly calculate the electrical power in our signa | |
|
||
power = currents * voltages | ||
power | ||
|
||
.. _alignment-and-coordinate-inheritance: | ||
|
||
Alignment and Coordinate Inheritance | ||
------------------------------------ | ||
|
||
.. _data-alignment: | ||
|
||
Data Alignment | ||
~~~~~~~~~~~~~~ | ||
Comment on lines
+657
to
+658
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: add comment about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the only note I have on open_groups, it probably deserves more. https://github.com/pydata/xarray/blob/main/doc/getting-started-guide/quick-overview.rst?plain=1#L284 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gonna prioritize merging this and improving documentation for |
||
|
||
The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) with those in their parent nodes. | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. note:: | ||
If you were a previous user of the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package, this is different from what you're used to! | ||
In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possible nit (feel free to ignore): Would it be clearer to say the information (or specifically There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's a great point, it would definitely be both more clear and more accurate to say that instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clarified in 9b8fc9b |
||
This allows us to provide features like :ref:`coordinate-inheritance`. See the migration guide for more details on the differences (LINK). | ||
|
||
To demonstrate, let's first generate some example datasets which are not aligned with one another: | ||
|
||
.. ipython:: python | ||
|
||
# (drop the attributes just to make the printed representation shorter) | ||
ds = xr.tutorial.open_dataset("air_temperature").drop_attrs() | ||
|
||
ds_daily = ds.resample(time="D").mean("time") | ||
ds_weekly = ds.resample(time="W").mean("time") | ||
ds_monthly = ds.resample(time="ME").mean("time") | ||
|
||
These datasets have different lengths along the ``time`` dimension, and are therefore not aligned along that dimension. | ||
|
||
.. ipython:: python | ||
|
||
ds_daily.sizes | ||
ds_weekly.sizes | ||
ds_monthly.sizes | ||
|
||
We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it would be more correct to say that we cannot store them unchanged. |
||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
xr.align(ds_daily, ds_weekly, ds_monthly, join="exact") | ||
|
||
But we :ref:`previously said <why>` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? | ||
If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) | ||
|
||
(TODO: Looks like this error message could be improved by including information about which sizes are not equal.) | ||
|
||
This is because DataTree checks that data in child nodes align exactly with their parents. | ||
|
||
.. note:: | ||
This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions <https://www.unidata.ucar.edu/software/netcdf/workshops/2007/groups-types/Introduction.html>`_, as in netCDF-4 files dimensions are `visible to all child groups <https://docs.unidata.ucar.edu/netcdf-c/current/groups.html>`_. | ||
|
||
This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before getting to this statement, I had added a comment saying we should make it clear that the alignment check ensures alignment with all ancestors, not just the immediate parent. But this covers it nicely! |
||
|
||
.. code:: python | ||
|
||
xr.align(child.dataset, parent.dataset for parent in child.parents, join="exact") | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. | ||
|
||
.. ipython:: python | ||
|
||
dt = xr.DataTree.from_dict( | ||
{"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly} | ||
) | ||
dt | ||
|
||
Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group. | ||
|
||
This is a useful way to organise our data because we can still operate on all the groups at once. | ||
For example we can extract all three timeseries at a specific lat-lon location: | ||
|
||
.. ipython:: python | ||
|
||
dt.sel(lat=75, lon=300) | ||
|
||
or compute the standard deviation of each timeseries to find out how it varies with sampling frequency: | ||
|
||
.. ipython:: python | ||
|
||
dt.std(dim="time") | ||
|
||
.. _coordinate-inheritance: | ||
|
||
Coordinate Inheritance | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm tempted to say display it again after this paragraph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in d2918bb |
||
|
||
We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. | ||
|
||
.. note:: | ||
This is also a new feature relative to the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package. | ||
|
||
Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group: | ||
|
||
.. ipython:: python | ||
|
||
dt = xr.DataTree.from_dict( | ||
{ | ||
"/": ds.drop_dims("time"), | ||
"daily": ds_daily.drop_vars(["lat", "lon"]), | ||
"weekly": ds_weekly.drop_vars(["lat", "lon"]), | ||
"monthly": ds_monthly.drop_vars(["lat", "lon"]), | ||
} | ||
) | ||
dt | ||
|
||
This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. | ||
Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. | ||
|
||
We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups: | ||
|
||
.. ipython:: python | ||
|
||
dt.daily.coords | ||
dt["daily/lat"] | ||
|
||
(TODO: the repr of ``dt.coords`` should display which coordinates are inherited) | ||
|
||
As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group. | ||
|
||
If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: | ||
|
||
.. ipython:: python | ||
|
||
print(dt["/daily"]) | ||
|
||
This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it. | ||
|
||
We can also still perform all the same operations on the whole tree: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
dt.sel(lat=75, lon=300) | ||
|
||
dt.std(dim="time") | ||
|
||
(TODO: The first one repeats coordinates in the result due to https://github.com/pydata/xarray/issues/9475) | ||
|
||
(TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) | ||
|
||
.. _overriding-inherited-coordinates: | ||
|
||
Overriding Inherited Coordinates | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
We can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. | ||
|
||
EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? | ||
|
||
EXAMPLE OF INHERITING FROM A GRANDPARENT? | ||
|
||
EXPLAIN DEDUPLICATION? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the plan to include these points in this PR, or merge what is here (maybe with this commented out) and then add more content later? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going to do it in this PR, but given that everyone seems to be happy with what's here already, and this is a natural break point, perhaps I will just merge this for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A follow-up issue could be to "document the subtleties of coordinate inheritance" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (I had hoped to add these bits before you reviewed it @owenlittlejohns ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed that content for use in a future PR in 6cab6f8 |
Uh oh!
There was an error while loading. Please reload this page.