Improving the string repr #184

TomNicholas · 2023-01-05T20:59:21Z

We've discussed improving the html repr but the string repr could also be improved.

It currently looks like this:

DataTree('None', parent=None)
│   Dimensions:  ()
│   Data variables:
│       foo      <U6 'orange'
└── DataTree('a')
    │   Dimensions:  (y: 3)
    │   Coordinates:
    │     * y        (y) int64 0 1 2
    │   Data variables:
    │       bar      int64 0
    ├── DataTree('b')
    │       Dimensions:  ()
    │       Data variables:
    │           zed      float64 nan
    └── DataTree('c')
        └── DataTree('d')

Some things that could be better:

We don't need to say DataTree over and over

This is a product of the recursive way we currently write the reprs, but it's redundant. Xarray Dataset's repr doesn't say Variable over and over again. Instead we should just have it say <datatree.DataTree> once at the top, and then the node structure given is enough to imply that the nodes are also DataTree objects, with just the node names listed.

Automatically truncate long dataset outputs

We need to automatically use xarray's options for truncating dataset reprs pretty brutally to try to keep the string repr size down.

Whether root node has a parent

Datatree objects can be parentless (i.e. a root node) or have a parent (i.e. a subtree). Currently this is communicated by the parent=... part of the first line of the repr. This was motivated by the idea of the repr being executable, but I don't think it's great. It doesn't neatly communicate the idea that there is the rest of the tree above this node, and it's not actually executable either.

What might be better is just to have some kind of ... continuation above the first node.

Representing nameless root nodes

Datatree allows the root node (and the root node only) to have a name of None. (See #81). What's the best way to disambiguate this from having a root node named "None"? Should we use <None>, or would that be ambiguous with "<None>? Should we use string quotation marks around the names of all other nodes (which would be inconsistent with how xarray displays variable names)?

With all these suggestions together we might get something like this:

<datatree.DataTree>
...
│
<None>
│   Dimensions:  ()
│   Data variables:
│       foo      <U6 'orange'
└── a
    │   Dimensions:  (y: 3)
    │   Coordinates:
    │     * y        (y) int64 0 1 2
    │   Data variables:
    │       bar      int64 0
    ├── b
    │       Dimensions:  ()
    │       Data variables:
    │           zed      float64 nan
    └── c
        └── d

Do we think that's any better?

The text was updated successfully, but these errors were encountered:

TomNicholas · 2024-08-13T16:16:10Z

I think all of these ideas were discussed and or implemented upstream in pydata/xarray#9064

TomNicholas closed this as completed Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the string repr #184

Improving the string repr #184

TomNicholas commented Jan 5, 2023

TomNicholas commented Aug 13, 2024

Improving the string repr #184

Improving the string repr #184

Comments

TomNicholas commented Jan 5, 2023

TomNicholas commented Aug 13, 2024