You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
I tried to use the to_netcdf function to store a dataset into a NetCDF file, but the following exception was raised
Traceback (most recent call last):
File "dask-error.py", line 27, in <module>
ds.to_netcdf("test.nc")
File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/core/dataset.py", line 1544, in to_netcdf
return to_netcdf(
File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/api.py", line 1051, in to_netcdf
scheduler = _get_scheduler()
File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/locks.py", line 79, in _get_scheduler
actual_get = dask.base.get_scheduler(get, collection)
AttributeError: module 'dask' has no attribute 'base'
This code sample works perfectly as expected when the dask package is not installed in the environment, and the method works as expected. However, we dask is installed the _get_scheduler function is called and produces the error (this can be found here)
After a little digging through, the problem is that the base module in the dask package depends on the toolz package, which is not a default dependency of dask and so causes a silent import failure when dask initialises its namespace (https://github.com/dask/dask/blob/416d348f7174a302815758cb87dbf6983226ddc5/dask/__init__.py#L10). As a result, the base package is not importable form the dask top level, and importing it separately gives as follows
from dask import base
raises a ModuleNotFoundError.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/dask/base.py", line 13, in <module>
from tlz import merge, groupby, curry, identity
ModuleNotFoundError: No module named 'tlz'
I recommend the following fix. At the following line in the _get_scheduler function
I should, however, point out that get_scheduler does not appear to be part of the Dask public API.
What you expected to happen:
The to_netcdf method should have exited silently and created a new file in the working directory with the contents of the data set.
Minimal Complete Verifiable Example:
This code is basically the "Toy weather data" example from the documentation, except for the last line.
Anything else we need to know?:
As mentioned above, the error on manifests when the dask package with no extras installed is present in the environment. (Many of the extras require the toolz package, at which time the import error goes away.)
Environment:
In a clean virtual environment, install the following packages.
pip install xarray netCDF4 dask
The package versions installed are as followed (generated by pip freeze):
What happened:
I tried to use the
to_netcdf
function to store a dataset into a NetCDF file, but the following exception was raisedThis code sample works perfectly as expected when the dask package is not installed in the environment, and the method works as expected. However, we dask is installed the
_get_scheduler
function is called and produces the error (this can be found here)xarray/xarray/backends/locks.py
Line 79 in b9e6a36
After a little digging through, the problem is that the
base
module in the dask package depends on the toolz package, which is not a default dependency of dask and so causes a silent import failure when dask initialises its namespace (https://github.com/dask/dask/blob/416d348f7174a302815758cb87dbf6983226ddc5/dask/__init__.py#L10). As a result, the base package is not importable form the dask top level, and importing it separately gives as followsraises a ModuleNotFoundError.
I recommend the following fix. At the following line in the
_get_scheduler
functionxarray/xarray/backends/locks.py
Line 75 in b9e6a36
replace the import with the following
and remove
dask.base
from the later call.I should, however, point out that
get_scheduler
does not appear to be part of the Dask public API.What you expected to happen:
The
to_netcdf
method should have exited silently and created a new file in the working directory with the contents of the data set.Minimal Complete Verifiable Example:
This code is basically the "Toy weather data" example from the documentation, except for the last line.
Anything else we need to know?:
As mentioned above, the error on manifests when the dask package with no extras installed is present in the environment. (Many of the extras require the toolz package, at which time the import error goes away.)
Environment:
In a clean virtual environment, install the following packages.
The package versions installed are as followed (generated by
pip freeze
):(Also running python3.8.2 on Debian Linux, not that I suppose this matters.)
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.8.2+ (heads/3.8:882a7f44da, Apr 26 2020, 19:31:38) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3xarray: 0.15.1
pandas: 1.0.5
numpy: 1.18.5
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.18.1
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: