Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Improving the string repr #184

Closed
TomNicholas opened this issue Jan 5, 2023 · 1 comment
Closed

Improving the string repr #184

TomNicholas opened this issue Jan 5, 2023 · 1 comment

Comments

@TomNicholas
Copy link
Member

We've discussed improving the html repr but the string repr could also be improved.

It currently looks like this:

DataTree('None', parent=None)
│   Dimensions:  ()
│   Data variables:
│       foo      <U6 'orange'
└── DataTree('a')
    │   Dimensions:  (y: 3)
    │   Coordinates:
    │     * y        (y) int64 0 1 2Data variables:
    │       bar      int64 0
    ├── DataTree('b')
    │       Dimensions:  ()
    │       Data variables:
    │           zed      float64 nan
    └── DataTree('c')
        └── DataTree('d')

Some things that could be better:

  1. We don't need to say DataTree over and over

This is a product of the recursive way we currently write the reprs, but it's redundant. Xarray Dataset's repr doesn't say Variable over and over again. Instead we should just have it say <datatree.DataTree> once at the top, and then the node structure given is enough to imply that the nodes are also DataTree objects, with just the node names listed.

  1. Automatically truncate long dataset outputs

We need to automatically use xarray's options for truncating dataset reprs pretty brutally to try to keep the string repr size down.

  1. Whether root node has a parent

Datatree objects can be parentless (i.e. a root node) or have a parent (i.e. a subtree). Currently this is communicated by the parent=... part of the first line of the repr. This was motivated by the idea of the repr being executable, but I don't think it's great. It doesn't neatly communicate the idea that there is the rest of the tree above this node, and it's not actually executable either.

What might be better is just to have some kind of ... continuation above the first node.

  1. Representing nameless root nodes

Datatree allows the root node (and the root node only) to have a name of None. (See #81). What's the best way to disambiguate this from having a root node named "None"? Should we use <None>, or would that be ambiguous with "<None>? Should we use string quotation marks around the names of all other nodes (which would be inconsistent with how xarray displays variable names)?


With all these suggestions together we might get something like this:

<datatree.DataTree>
...
│
<None>Dimensions:  ()
│   Data variables:
│       foo      <U6 'orange'
└── aDimensions:  (y: 3)
    │   Coordinates:
    │     * y        (y) int64 0 1 2Data variables:
    │       bar      int64 0
    ├── bDimensions:  ()
    │       Data variables:
    │           zed      float64 nan
    └── c
        └── d

Do we think that's any better?

@TomNicholas
Copy link
Member Author

I think all of these ideas were discussed and or implemented upstream in pydata/xarray#9064

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant