Skip to content

add storage_options arg to to_zarr #5615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 21, 2021
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ New Features
By `Justus Magin <https://github.com/keewis>`_.
- Added ``**kwargs`` argument to :py:meth:`open_rasterio` to access overviews (:issue:`3269`).
By `Pushkar Kopparla <https://github.com/pkopparla>`_.
- Added ``storage_options`` argument to :py:meth:`to_zarr` (:issue:`5601`).
By `Ray Bell <https://github.com/raybellwaves>`_, `Zachary Blackwood <https://github.com/blackary>`_ and
`Nathan Lis <https://github.com/wxman22>`_.


Breaking changes
Expand Down
21 changes: 19 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1319,6 +1319,7 @@ def to_zarr(
append_dim: Hashable = None,
region: Mapping[str, slice] = None,
safe_chunks: bool = True,
storage_options: Dict[str, str] = None,
):
"""This function creates an appropriate datastore for writing a dataset to
a zarr ztore
Expand All @@ -1330,6 +1331,22 @@ def to_zarr(
store = _normalize_path(store)
chunk_store = _normalize_path(chunk_store)

if storage_options is None:
mapper = store
chunk_mapper = chunk_store
else:
from fsspec import get_mapper

if not isinstance(store, str):
raise ValueError(
f"store must be a string to use storage_options. Got {type(store)}"
)
mapper = get_mapper(store, **storage_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we import fsspec here and raise an ValueError if fsspec is not available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what behavior we should expect if a store (e.g. GCSMapper) is provided along with storage_options. We should probably check that store is a valid fsspec target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we import fsspec here and raise an ValueError if fsspec is not available?

Did your suggestion. As it stands if fsspec is not available it'll raise a ModuleNotFoundError: No module named 'fsspec'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fsspec is such a small dep, with no deps of its own, you may wish to add it to xarray's deps for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what behavior we should expect if a store (e.g. GCSMapper) is provided along with storage_options. We should probably check that store is a valid fsspec target.

Not sure. I tend to just use a string e.g. gs://path/file.zarr for store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the input to get_mapper must be a string. So I think a useful check here would that if storage_options is provided, store must be a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, every implementation takes a string for the "root"; it is not completely inconceivable that some implementation might take something else, but I can't immediately imagine it.

if chunk_store is not None:
chunk_mapper = get_mapper(chunk_store, **storage_options)
else:
chunk_mapper = chunk_store

if encoding is None:
encoding = {}

Expand Down Expand Up @@ -1372,13 +1389,13 @@ def to_zarr(
already_consolidated = False
consolidate_on_close = consolidated or consolidated is None
zstore = backends.ZarrStore.open_group(
store=store,
store=mapper,
mode=mode,
synchronizer=synchronizer,
group=group,
consolidated=already_consolidated,
consolidate_on_close=consolidate_on_close,
chunk_store=chunk_store,
chunk_store=chunk_mapper,
append_dim=append_dim,
write_region=region,
safe_chunks=safe_chunks,
Expand Down
3 changes: 3 additions & 0 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,9 @@ def open_zarr(
falling back to read non-consolidated metadata if that fails.
chunk_store : MutableMapping, optional
A separate Zarr store only for chunk data.
storage_options : dict, optional
Any additional parameters for the storage backend (ignored for local
paths).
decode_timedelta : bool, optional
If True, decode variables and coordinates with time units in
{'days', 'hours', 'minutes', 'seconds', 'milliseconds', 'microseconds'}
Expand Down
11 changes: 8 additions & 3 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1922,6 +1922,7 @@ def to_zarr(
append_dim: Hashable = None,
region: Mapping[str, slice] = None,
safe_chunks: bool = True,
storage_options: Dict[str, str] = None,
) -> "ZarrStore":
"""Write dataset contents to a zarr group.

Expand All @@ -1941,10 +1942,10 @@ def to_zarr(
Parameters
----------
store : MutableMapping, str or Path, optional
Store or path to directory in file system.
Store or path to directory in local or remote file system.
chunk_store : MutableMapping, str or Path, optional
Store or path to directory in file system only for Zarr array chunks.
Requires zarr-python v2.4.0 or later.
Store or path to directory in local or remote file system only for Zarr
array chunks. Requires zarr-python v2.4.0 or later.
mode : {"w", "w-", "a", "r+", None}, optional
Persistence mode: "w" means create (overwrite if exists);
"w-" means create (fail if exists);
Expand Down Expand Up @@ -1999,6 +2000,9 @@ def to_zarr(
if Zarr arrays are written in parallel. This option may be useful in combination
with ``compute=False`` to initialize a Zarr from an existing
Dataset with aribtrary chunk structure.
storage_options : dict, optional
Any additional parameters for the storage backend (ignored for local
paths).

References
----------
Expand Down Expand Up @@ -2031,6 +2035,7 @@ def to_zarr(
self,
store=store,
chunk_store=chunk_store,
storage_options=storage_options,
mode=mode,
synchronizer=synchronizer,
group=group,
Expand Down
1 change: 1 addition & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def LooseVersion(vstring):
has_nc_time_axis, requires_nc_time_axis = _importorskip("nc_time_axis")
has_rasterio, requires_rasterio = _importorskip("rasterio")
has_zarr, requires_zarr = _importorskip("zarr")
has_zarr_2_5_0, requires_zarr_2_5_0 = _importorskip("zarr", minversion="2.5.0")
has_fsspec, requires_fsspec = _importorskip("fsspec")
has_iris, requires_iris = _importorskip("iris")
has_cfgrib, requires_cfgrib = _importorskip("cfgrib")
Expand Down
12 changes: 12 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
requires_scipy,
requires_scipy_or_netCDF4,
requires_zarr,
requires_zarr_2_5_0,
)
from .test_coding_times import (
_ALL_CALENDARS,
Expand Down Expand Up @@ -2388,6 +2389,17 @@ def create_zarr_target(self):
yield tmp


@requires_fsspec
@requires_zarr_2_5_0
def test_zarr_storage_options():
pytest.importorskip("aiobotocore")
ds = create_test_data()
store_target = "memory://test.zarr"
ds.to_zarr(store_target, storage_options={"test": "zarr_write"})
ds_a = xr.open_zarr(store_target, storage_options={"test": "zarr_read"})
assert_identical(ds, ds_a)


@requires_scipy
class TestScipyInMemoryData(CFEncodedBase, NetCDF3Only):
engine = "scipy"
Expand Down