-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Dask error on xarray.corr #5715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @Gijom , I can repro. I think the fix should be fairly easy, if someone wants to take a swing. I'm not sure why the existing tests don't cover it. |
The responsible code for the error originally comes from the call to # 4. Compute covariance along the given dim
# N.B. `skipna=False` is required or there is a bug when computing
# auto-covariance. E.g. Try xr.cov(da,da) for
# da = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"])
cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=True, min_count=1) / (
valid_count
) In any case, the parrallel module imports dask in a try catch block to ignore the import error. So this is not a surprise that when using dask latter there is an error if it was not imported. I can see two possibilities:
Now I do not have any big picure there so there are probably better solutions. |
I had a look to it this morning and I think I managed to solve the issue by replacing the calls to For (successful) testing I used the same code as above plus the following: ds_dask = ds.chunk({"t": 10})
yy = xr.corr(ds['y'], ds['y']).to_numpy()
yy_dask = xr.corr(ds_dask['y'], ds_dask['y']).to_numpy()
yx = xr.corr(ds['y'], ds['x']).to_numpy()
yx_dask = xr.corr(ds_dask['y'], ds_dask['x']).to_numpy()
np.testing.assert_allclose(yy, yy_dask), "YY: {} is different from {}".format(yy, yy_dask)
np.testing.assert_allclose(yx, yx_dask), "YX: {} is different from {}".format(yx, yx_dask) The results are not exactly identical but almost which is probably due to numerical approximations of multiple computations in the dask case. I also tested the correlation of simple DataArrays without dask installed and the result seem coherent (close to 0 for uncorrelated data and very close to 1 when correlating identical variables). Should I make a pull request ? Should I implement this test ? Any others ? |
That sounds great @Gijom ! Thanks for working through that. A PR would be welcome! In the tests, we should be running this outside a |
Calls to dask functions are replaced by calls to the pycompat functions
What happened:
When I use xarray.corr on two Dataarrays I get a
NameError: name 'dask' is not defined
error. Notice that dask is not installed in my environement.What you expected to happen:
Obtain the correlation values without dask interfering (as it should be optional in my understanding)
Minimal Complete Verifiable Example:
Results in:
Environment:
The text was updated successfully, but these errors were encountered: