Concatenating datasets with staggered grids

Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (handling [staggered grids](https://xgcm.readthedocs.io/en/latest/grids.html) is why [xGCM](https://github.com/xgcm/xgcm) exists).

```python
import xarray as xr

ds1 = xr.Dataset(
    data_vars={
        'a': ('x_center', [1, 2, 3]),
        'b': ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    data_vars={
        'a': ('x_center', [4, 5, 6]),
        'b': ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)
```
I have netcdf output files from an ocean model (UCLA-ROMS) that have this basic structure.

Combining these types of datasets seems like a bit of a pain to do with kerchunk at the moment.
To concatenate along the x direction, I actually need to concatenate `a` along `x_center`, and `b` along `x_outer`. So presumably I have to call `MultiZarrToZarr` once for each variable (or group of variables) that needs to be concatenated along a common dimension. My real dataset is split along multiple dimensions and has multiple staggered grid locations for each dimension, meaning I have to call `MultiZarrToZarr` something like 6 times.

This problem is analogous to what happens inside [`xarray.combine_by_coords`](https://docs.xarray.dev/en/stable/generated/xarray.combine_by_coords.html), which automatically [~~groups variables into sets with common dimensions~~](https://github.com/pydata/xarray/blob/da647b06312bd93c3412ddd712bf7ecb52e3f28b/xarray/core/combine.py#L956) _splits datasets up into sets consisting of the same variable from each dataset_, concatenates each set separately (along multiple dimensions in general), then merges the results.

Is that approach (call `MultiZarrToZarr` multiple times then call `kerchunk.combine.merge_vars`) the recommended way to handle this case currently? Could we imagine some improvement to the `kerchunk.combine` API that might make this easier?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concatenating datasets with staggered grids #362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concatenating datasets with staggered grids #362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions