-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Is it unsafe to write on independent regions using the to_zarr method when one of chunks is not complete? #9072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This check doesn't look at the |
If it is safe to write in the way that I specified in the code then I'm going to use the safe_chunks=False that you mentioned. Thanks. |
(I think it is. but you should double check). |
If this helps in something, I did the following code to try to corroborate that there is no data corruption when writing in the way that I specified in the code. import xarray as xr
import numpy as np
def clear_encoding(arr):
arr.encoding.clear()
for dim in arr.dims:
arr.coords[dim].encoding.clear()
elements = list(range(1, 77))
for chunk in [2, 7, 10, 16]:
print("chunk", chunk)
a = xr.DataArray(elements, dims=["a"], coords={"a": elements}).chunk({"a": chunk})
clear_encoding(a)
a.to_zarr("test", mode="w")
expected = a.copy()
for start in [5, 8, 7, 9, 20]:
print("start", start)
for end in [22, 30, 45, 76]:
print("end", end)
for i in range(6):
range_slice = slice(start, end)
b = (a * 10).isel(a=range_slice)
expected.loc[b.a[0]: b.a[-1]] = b
clear_encoding(b)
b.to_zarr("test", region={"a": range_slice}, safe_chunks=False)
c = xr.open_zarr("test")["__xarray_dataarray_variable__"].compute()
try:
assert expected.equals(c)
except Exception as e:
print(start, end, chunk, i)
print(c)
print(expected)
raise e |
What happened?
I'm trying to update a region of my Zarr array, but it is raising the following error:
For me it does not make so much sense, because I'm writing in independent chunks, even if the first chunk is not completely being updated it should not corrupt data because there is no more than one process writing to a single chunk at the time, so it should be safe, or at least that's my understanding, could you clarify to me if this is the expected behavior? or if I'm doing something wrong when I try to write?
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.1
cftime: None
nc_time_axis: None
iris: None
bottleneck: 1.3.8
dask: 2024.4.2
distributed: 2024.4.2
matplotlib: 3.8.3
cartopy: None
seaborn: None
numbagg: 0.8.1
fsspec: 2024.5.0
cupy: None
pint: None
sparse: 0.15.4
flox: 0.9.6
numpy_groupies: 0.10.2
setuptools: 69.1.1
pip: 22.0.2
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.22.2
sphinx: 7.2.6
The text was updated successfully, but these errors were encountered: