Skip to content

Fix DataArray.__dask_scheduler__ to point to dask.threaded.get #1760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2017

Conversation

mrocklin
Copy link
Contributor

@mrocklin mrocklin commented Dec 5, 2017

Previously this erroneously pointed to an optimize function, likely a
copy-paste error.

For testing this also redirects the .compute methods to use the
dask.compute function directly if dask.version >= '0.16.0'.

Closes #1759

  • Closes #xxxx (remove if there is no corresponding issue, which should only be the case for minor changes)
  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

@mrocklin
Copy link
Contributor Author

mrocklin commented Dec 5, 2017

The mock tests still fail for me. I'm not sure how best to generalize them. Recommendations would be welcome here.

@max-sixty
Copy link
Collaborator

I took a look at the tests - I don't have too much helpful beyond breaking it down.

For example, this tests that when .compute(...) is called on a Variable backed by a dask Array, that .compute gets called on the dask Array.

Has that changed? Does dask.compute(x) call x.compute()?

@mrocklin
Copy link
Contributor Author

mrocklin commented Dec 5, 2017

Has that changed? Does dask.compute(x) call x.compute()?

No, we no longer call Array.compute at all

@max-sixty
Copy link
Collaborator

Right, though xarray used to call .load, which I think must have called through to obj.compute? If that's the case, potentially the tests need changing

xarray/tests/test_backends.py:252: AssertionError
______________________ test_dask_kwargs_variable[compute] ______________________
method = 'compute'
    @pytest.mark.parametrize("method", ['load', 'compute'])
    def test_dask_kwargs_variable(method):
        x = Variable('y', da.from_array(np.arange(3), chunks=(2,)))
        # args should be passed on to da.Array.compute()
        with mock.patch.object(da.Array, 'compute',
                               return_value=np.arange(3)) as mock_compute:
            getattr(x, method)(foo='bar')
>       mock_compute.assert_called_with(foo='bar')
xarray/tests/test_dask.py:724: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_mock_self = <MagicMock name='compute' id='139838059011600'>, args = ()
kwargs = {'foo': 'bar'}, expected = "compute(foo='bar')"
    def assert_called_with(_mock_self, *args, **kwargs):
        """assert that the mock was called with the specified arguments.
    
            Raises an AssertionError if the args and keyword args passed in are
            different to the last call to the mock."""
        self = _mock_self
        if self.call_args is None:
            expected = self._format_mock_call_signature(args, kwargs)
>           raise AssertionError('Expected call: %s\nNot called' % (expected,))
E           AssertionError: Expected call: compute(foo='bar')
E           Not called

@mrocklin
Copy link
Contributor Author

mrocklin commented Dec 7, 2017

@shoyer how would you like me to proceed here? Should I drop the current modifications to the .compute methods and just leave in the __dask_scheduler__ fix?

@shoyer
Copy link
Member

shoyer commented Dec 7, 2017

Should I drop the current modifications to the .compute methods and just leave in the dask_scheduler fix?

That's probably easiest for now -- updating our mock tests to handle either calling dask.compute() or the Array.compute() method would be a little tricky.

It would be nice to add a test case that uses the default scheduler, e.g., that does a basic compute on each xarray type (Dataset, DataArray and Variable) inside a with dask.set_options(get=None) block. That would give us test coverage for __dask_scheduler__, which is currently missing.

(It would probably be better to explicitly set get explicitly in a function when computing xarray objects containing dask arrays, but that's a larger project.)

@mrocklin mrocklin force-pushed the dataarray-scheduler branch from c054314 to 9f0b534 Compare December 7, 2017 20:29
@mrocklin mrocklin force-pushed the dataarray-scheduler branch from 9f0b534 to 1a9399c Compare December 7, 2017 21:41
@mrocklin
Copy link
Contributor Author

mrocklin commented Dec 7, 2017

Done. Tests pass.

@shoyer shoyer merged commit ea72303 into pydata:master Dec 7, 2017
@shoyer
Copy link
Member

shoyer commented Dec 7, 2017

Thanks!

@mrocklin mrocklin deleted the dataarray-scheduler branch December 7, 2017 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dask compute on reduction failes with ValueError
3 participants