Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

API for reorganizing levels #186

Closed
TomNicholas opened this issue Jan 6, 2023 · 3 comments
Closed

API for reorganizing levels #186

TomNicholas opened this issue Jan 6, 2023 · 3 comments
Labels
design question enhancement New feature or request

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Jan 6, 2023

@jbusecke and I were discussing API for reorganizing levels of a tree.

For example, say I have 2 models, each which ran 2 scenarios. Thats 4 data-containing leaves in my tree, but there are 2 equally-valid ways to organise this, either model-first or scenario-first.

The model-first tree has node paths:
/mod1/scen1
/mod2/scen1
/mod1/scen2
/mod2/scen2
whilst the scenario-first tree has node paths:
/scen1/mod1
/scen2/mod1
/scen1/mod2
/scen2/mod2

Either of these is equally valid, and one might be preferred sometimes over the other, so we should have a method than can rearrange one structure into the other.

The question is what the API to do this should look like so that it's general, intuitive, and powerful.

@TomNicholas TomNicholas added enhancement New feature or request design question labels Jan 6, 2023
@TomNicholas
Copy link
Member Author

One method that might be useful (Inspired by xarray's swap_dims):

class DataTree:
    def swap_levels(
        self: DataTree,
        levels_dict: Mapping[Any, str] | None = None,
        **levels_kwargs,
    ) -> DataTree:
        """
        Returns a new DataTree where all nodes have swapped levels.

        Renames components of paths to nodes in the tree.

        Parameters
        ----------
        levels_dict : dict-like
            Dictionary whose keys are current levels and whose values
            are new levels.
        **levels_kwargs : {existing_level: new_level, ...}, optional
            The keyword arguments form of ``levels_dict``.
            One of levels_dict or levels_kwargs must be provided.

        Returns
        -------
        swapped : DataTree
            DataTree where every node has swapped levels.
        """

I don't think this is enough to solve the use case in the comment above though...
Perhaps it would work if we allow globs?

dt.swap_levels({"mod*": "scen*", "scen*": "mod*"}})

This would work by renaming components of paths in such a way that the levels end up reordered. The implementation would have to be careful though, possibly involving temporary names.

I kind of want something like

dt.reorder_levels("mod<->scen")

but not sure what the API for that should look like, or how that should behave in cases where the path segment "mod" appears in multiple levels...

@jbusecke
Copy link
Contributor

jbusecke commented Jan 16, 2023

The problem I see with the above API is that you are assuming each value of a level/category actually contains a 'globbable' part - or more generally the node name contains some meta information about the 'kind' of level. I dont think that is the case in many real world examples. Take for instance a CMIP example (simplified)

/GFDL/hist
/CESM/hist
/GFDL/ssp
/CESM/ssp

There is no common string to identify CESM and GFDL as mod, same with the experiments. This is going back to my earlier comment that we really require level labels for this to work. If we have a special tree object that somehow knows that the first level is mod and the second is scen, then your above works. This might also enable an easier way to reorder more complex scenarios (and do multiple swaps at once):

# assume dt is ordered as "mod/scen/member"
dt.reorder_levels("member/scen/mod") #keeping with the 'filepath' like syntax

@TomNicholas
Copy link
Member Author

Closed in favour of pydata/xarray#9344

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
design question enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants