Skip to content

Fix in vectorized item assignment #1746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 9, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ Enhancements
Bug fixes
~~~~~~~~~

- Bug fix in vectorized assignment (:issue:`1743`, `1744`).
By `Keisuke Fujii <https://github.com/fujiisoup>`_.


.. _whats-new.0.10.0:

v0.10.0 (20 November 2017)
Expand Down
4 changes: 3 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,9 @@ def __setitem__(self, key, value):
if isinstance(key, basestring):
self.coords[key] = value
else:
# xarray-style array indexing
# DataArray key -> Variable key
key = {k: v.variable if isinstance(v, DataArray) else v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enforce consistency for coordinates here? My inclination would be that we should support exactly the same keys in setitem as are valid in getitem. Ideally we should also reuse the same code. That means we should raise errors if there are multiple indexers with inconsistent alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination would be that we should support exactly the same keys in setitem as are valid in getitem.

Reasonable. I will add a validation.

for k, v in self._item_key_to_dict(key).items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that coordinates are consistent on the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done a few lines above, obj = self[key], where in .isel we check the coordinates in the key.

But I am wondering this unnecessary indexing, though I think this implementation is the simplest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This could indeed be a significant performance hit. That said, I'm OK leaving this for now, with a TODO note to optimize it later.

self.variable[key] = value

def __delitem__(self, key):
Expand Down
15 changes: 10 additions & 5 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,17 +637,22 @@ def __setitem__(self, key, value):
"""
dims, index_tuple, new_order = self._broadcast_indexes(key)

if isinstance(value, Variable):
value = value.set_dims(dims).data

if new_order:
value = duck_array_ops.asarray(value)
if not isinstance(value, Variable):
value = as_compatible_data(value)
if value.ndim > len(dims):
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this special case error message now that we call set_dims below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think value = Variable(dims[-value.ndim:], value) fails if value.ndim > len(dims).

'shape mismatch: value array of shape %s could not be'
'broadcast to indexing result with %s dimensions'
% (value.shape, len(dims)))
if value.ndim == 0:
value = Variable((), value)
else:
value = Variable(dims[-value.ndim:], value)
# broadcast to become assignable
value = value.set_dims(dims).data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to revert nputils.NumpyVindexAdapter to the master version and to make a broadcasting here.


if new_order:
value = duck_array_ops.asarray(value)
value = value[(len(dims) - value.ndim) * (np.newaxis,) +
(Ellipsis,)]
value = np.moveaxis(value, new_order, range(len(new_order)))
Expand Down
30 changes: 30 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,36 @@ def test_setitem(self):
expected[t] = 1
self.assertArrayEqual(orig.values, expected)

def test_setitem_fancy(self):
# vectorized indexing
da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind = Variable(['a'], [0, 1])
da[dict(x=ind, y=ind)] = 0
expected = DataArray([[0, 1], [1, 0], [1, 1]], dims=['x', 'y'])
self.assertDataArrayIdentical(expected, da)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future (no need to change this time), you can use assert_identical, from the newer testing tools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget this, maybe because I usually make tests based on the existing ones...
(I think it is what many contributors do).
I think we should update entire test cases at some point (similar to #1741), rather than gradual migration.

# assign another 0d-variable
da[dict(x=ind, y=ind)] = Variable((), 0)
expected = DataArray([[0, 1], [1, 0], [1, 1]], dims=['x', 'y'])
self.assertDataArrayIdentical(expected, da)
# assign another 1d-variable
da[dict(x=ind, y=ind)] = Variable(['a'], [2, 3])
expected = DataArray([[2, 1], [1, 3], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

# 2d-vectorized indexing
da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind_x = DataArray([[0, 1]], dims=['a', 'b'])
ind_y = DataArray([[1, 0]], dims=['a', 'b'])
da[dict(x=ind_x, y=ind_y)] = 0
expected = DataArray([[1, 0], [0, 1], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind = Variable(['a'], [0, 1])
da[ind] = 0
expected = DataArray([[0, 0], [0, 0], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

def test_contains(self):
data_array = DataArray(1, coords={'x': 2})
with pytest.warns(FutureWarning):
Expand Down
50 changes: 50 additions & 0 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -1440,6 +1440,56 @@ def test_setitem(self):
v[dict(x=[True, False], y=[False, True, False])] = 1
self.assertTrue(v[0, 1] == 1)

def test_setitem_fancy(self):
# assignment which should work as np.ndarray does
def assert_assigned_2d(array, key_x, key_y, values):
expected = array.copy()
expected[key_x, key_y] = values
v = Variable(['x', 'y'], array)
v[dict(x=key_x, y=key_y)] = values
self.assertArrayEqual(expected, v)

# 1d vectorized indexing
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=0)
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=Variable((), 0))
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=Variable(('a'), [3, 2]))
assert_assigned_2d(np.random.randn(4, 3),
key_x=slice(None),
key_y=Variable(['a'], [0, 1]),
values=Variable(('a'), [3, 2]))

# 2d-vectorized indexing
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a', 'b'], [[0, 1]]),
key_y=Variable(['a', 'b'], [[1, 0]]),
values=0)
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a', 'b'], [[0, 1]]),
key_y=Variable(['a', 'b'], [[1, 0]]),
values=[0])
assert_assigned_2d(np.random.randn(5, 4),
key_x=Variable(['a', 'b'], [[0, 1], [2, 3]]),
key_y=Variable(['a', 'b'], [[1, 0], [3, 3]]),
values=[2, 3])

# vindex with slice
v = Variable(['x', 'y', 'z'], np.ones((4, 3, 2)))
ind = Variable(['a'], [0, 1])
v[dict(x=ind, z=ind)] = 0
expected = Variable(['x', 'y', 'z'], np.ones((4, 3, 2)))
expected[0, :, 0] = 0
expected[1, :, 1] = 0
self.assertVariableIdentical(expected, v)

# dimension broadcast
v = Variable(['x', 'y'], np.ones((3, 2)))
ind = Variable(['a', 'b'], [[0, 1]])
Expand Down