You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 24, 2024. It is now read-only.
In #76 I refactored the tree structure to use a path-like syntax. This includes referring to the root of a tree as "/", same as in cd / in a unix-like filesystem.
This makes accessing nodes and variables of nodes quite neat, because you can reference nodes via absolute or relative paths:
This refactor also made DataTree objects only optionally have a name, as opposed to be before when they were required to have a name. (They still have a .name attribute now, it just can be None.)
In [28]: dt.name
Normally this doesn't matter, because when assigned a .parent a node's .name property will just point to the key under which it is stored as a child. This echoes the way an unnamed DataArray can be stored in a Dataset.
In [29]: importxarrayasxrIn [30]: ds=xr.Dataset()
In [31]: da=xr.DataArray(0)
In [32]: ds['foo'] =daIn [33]: ds['foo'].nameOut[33]: 'foo'
However this means that the root node of a tree is no longer required to have a name in general.
This is good because
As a user you normally don't care about the name of the root when manipulating the tree, only the names of the nodes,
It makes the __init__ signature simpler as name is no longer a required arg,
It most closely echoes how filepaths work (the filesystem root "/" doesn't have another name),
Roundtripping from Zarr/netCDF files still seems to work (see test_io.py),
Roundtripping from dictionaries still works if the root node is unnamed
In [35]: d= {node.path: node.dsfornodeindt.subtree}
In [36]: roundtrip=DataTree.from_dict(d)
In [37]: roundtripOut[37]:
DataTree('None', parent=None)
│ Dimensions: (y: 3, x: 2)
│ Dimensionswithoutcoordinates: y, x
│ Datavariables:
│ a (y) int64678
│ set0 (x) int64910
├── DataTree('set1')
│ │ Dimensions: ()
│ │ Datavariables:
│ │ aint640
│ │ bint641
│ ├── DataTree('set1')
│ └── DataTree('set2')
├── DataTree('set2')
│ │ Dimensions: (x: 2)
│ │ Dimensionswithoutcoordinates: x
│ │ Datavariables:
│ │ a (x) int6423
│ │ b (x) float640.10.2
│ └── DataTree('set1')
└── DataTree('set3')
In [38]: dt.equals(roundtrip)
Out[38]: True
But it's bad because
Roundtripping from dictionaries doesn't work anymore if the root node is named
In [39]: dt2=dtIn [40]: dt2.name="root"In [41]: d2= {node.path: node.dsfornodeindt2.subtree}
In [42]: roundtrip2=DataTree.from_dict(d2)
In [43]: roundtrip2Out[43]:
DataTree('None', parent=None)
│ Dimensions: (y: 3, x: 2)
│ Dimensionswithoutcoordinates: y, x
│ Datavariables:
│ a (y) int64678
│ set0 (x) int64910
├── DataTree('set1')
│ │ Dimensions: ()
│ │ Datavariables:
│ │ aint640
│ │ bint641
│ ├── DataTree('set1')
│ └── DataTree('set2')
├── DataTree('set2')
│ │ Dimensions: (x: 2)
│ │ Dimensionswithoutcoordinates: x
│ │ Datavariables:
│ │ a (x) int6423
│ │ b (x) float640.10.2
│ └── DataTree('set1')
└── DataTree('set3')
In [44]: dt2.equals(roundtrip2)
Out[44]: False
The signature of the DataTree.from_dict becomes a bit weird because if you want to name the root node the only way to do it is to pass a separate name argument, i.e.
I believe my comment was referring to supplying the root group when writing a datatree such that the child pahts are prepended with the group id (i.e. dt.to_netcdf('foo.nc', group='/foo/bar/')). I don't think there is anything that kept me from implementing that feature apart from my goal of an MVP at the time. I also think the changes in #76 (and your description above) will work with this feature if or when someone implements it. (tldr; I don't think there is a problem here)
In #76 I refactored the tree structure to use a path-like syntax. This includes referring to the root of a tree as
"/"
, same as incd /
in a unix-like filesystem.This makes accessing nodes and variables of nodes quite neat, because you can reference nodes via absolute or relative paths:
This refactor also made DataTree objects only optionally have a name, as opposed to be before when they were required to have a name. (They still have a
.name
attribute now, it just can beNone
.)Normally this doesn't matter, because when assigned a
.parent
a node's.name
property will just point to the key under which it is stored as a child. This echoes the way an unnamedDataArray
can be stored in aDataset
.However this means that the root node of a tree is no longer required to have a name in general.
This is good because
As a user you normally don't care about the name of the root when manipulating the tree, only the names of the nodes,
It makes the
__init__
signature simpler asname
is no longer a required arg,It most closely echoes how filepaths work (the filesystem root
"/"
doesn't have another name),Roundtripping from Zarr/netCDF files still seems to work (see
test_io.py
),Roundtripping from dictionaries still works if the root node is unnamed
But it's bad because
Roundtripping from dictionaries doesn't work anymore if the root node is named
The signature of the
DataTree.from_dict
becomes a bit weird because if you want to name the root node the only way to do it is to pass a separatename
argument, i.e.What do we think about this behaviour? Does this seem like a good design, or annoyingly finicky?
@jhamman I notice that in the code you wrote for the io you put a note about not being able to specify a root group for the tree. Is that related to this question? Do you have any other thoughts on this?
The text was updated successfully, but these errors were encountered: