Skip to content

Fix in vectorized item assignment #1746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 9, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ Enhancements
Bug fixes
~~~~~~~~~

- Bug fix in vectorized assignment (:issue:`1743`, `1744`).
By `Keisuke Fujii <https://github.com/fujiisoup>`_.


.. _whats-new.0.10.0:

v0.10.0 (20 November 2017)
Expand Down
18 changes: 18 additions & 0 deletions xarray/core/coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,3 +301,21 @@ def __getitem__(self, key):

def __unicode__(self):
return formatting.indexes_repr(self)


def assert_coordinate_consistent(obj, coords):
""" Maeke sure the dimension coordinate of obj is
consistent with coords.

obj: DataArray or Dataset
coords: Dict-like of variables
"""
for k in obj.dims:
# make sure there are no conflict in dimension coordinates
if (k in coords and k in obj.coords):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can drop the extra parentheses here inside if

coord = getattr(coords[k], 'variable', coords[k]) # Variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to insist that coords always has the same type (e.g., a dict of with Variable values).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really reasonable. Updated.

if not coord.equals(obj[k].variable):
raise IndexError(
'dimension coordinate {!r} conflicts between '
'indexed and indexing objects:\n{}\nvs.\n{}'
.format(k, obj[k], coords[k]))
10 changes: 8 additions & 2 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from .alignment import align, reindex_like_indexers
from .common import AbstractArray, BaseDataObject
from .coordinates import (DataArrayCoordinates, LevelCoordinatesSource,
Indexes)
Indexes, assert_coordinate_consistent)
from .dataset import Dataset, merge_indexes, split_indexes
from .pycompat import iteritems, basestring, OrderedDict, zip, range
from .variable import (as_variable, Variable, as_compatible_data,
Expand Down Expand Up @@ -484,7 +484,13 @@ def __setitem__(self, key, value):
if isinstance(key, basestring):
self.coords[key] = value
else:
# xarray-style array indexing
# Coordinates in key, value and self[key] should be consistent.
obj = self[key]
if isinstance(value, DataArray):
assert_coordinate_consistent(value, obj.coords)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of checking the consistency of coords on each DataArray argument in key instead. I guess we should probably check both!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you use obj.coords.variables here we can skip the awkward getattr(coords[k], 'variable', coords[k]) above.

# DataArray key -> Variable key
key = {k: v.variable if isinstance(v, DataArray) else v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enforce consistency for coordinates here? My inclination would be that we should support exactly the same keys in setitem as are valid in getitem. Ideally we should also reuse the same code. That means we should raise errors if there are multiple indexers with inconsistent alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination would be that we should support exactly the same keys in setitem as are valid in getitem.

Reasonable. I will add a validation.

for k, v in self._item_key_to_dict(key).items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that coordinates are consistent on the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done a few lines above, obj = self[key], where in .isel we check the coordinates in the key.

But I am wondering this unnecessary indexing, though I think this implementation is the simplest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This could indeed be a significant performance hit. That said, I'm OK leaving this for now, with a TODO note to optimize it later.

self.variable[key] = value

def __delitem__(self, key):
Expand Down
13 changes: 3 additions & 10 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@
from . import duck_array_ops
from .. import conventions
from .alignment import align
from .coordinates import DatasetCoordinates, LevelCoordinatesSource, Indexes
from .coordinates import (DatasetCoordinates, LevelCoordinatesSource, Indexes,
assert_coordinate_consistent)
from .common import ImplementsDatasetReduce, BaseDataObject
from .dtypes import is_datetime_like
from .merge import (dataset_update_method, dataset_merge_method,
Expand Down Expand Up @@ -1305,15 +1306,7 @@ def _get_indexers_coordinates(self, indexers):
# we don't need to call align() explicitly, because merge_variables
# already checks for exact alignment between dimension coordinates
coords = merge_variables(coord_list)

for k in self.dims:
# make sure there are not conflict in dimension coordinates
if (k in coords and k in self._variables and
not coords[k].equals(self._variables[k])):
raise IndexError(
'dimension coordinate {!r} conflicts between '
'indexed and indexing objects:\n{}\nvs.\n{}'
.format(k, self._variables[k], coords[k]))
assert_coordinate_consistent(self, coords)

attached_coords = OrderedDict()
for k, v in coords.items(): # silently drop the conflicted variables.
Expand Down
15 changes: 10 additions & 5 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,17 +637,22 @@ def __setitem__(self, key, value):
"""
dims, index_tuple, new_order = self._broadcast_indexes(key)

if isinstance(value, Variable):
value = value.set_dims(dims).data

if new_order:
value = duck_array_ops.asarray(value)
if not isinstance(value, Variable):
value = as_compatible_data(value)
if value.ndim > len(dims):
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this special case error message now that we call set_dims below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think value = Variable(dims[-value.ndim:], value) fails if value.ndim > len(dims).

'shape mismatch: value array of shape %s could not be'
'broadcast to indexing result with %s dimensions'
% (value.shape, len(dims)))
if value.ndim == 0:
value = Variable((), value)
else:
value = Variable(dims[-value.ndim:], value)
# broadcast to become assignable
value = value.set_dims(dims).data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to revert nputils.NumpyVindexAdapter to the master version and to make a broadcasting here.


if new_order:
value = duck_array_ops.asarray(value)
value = value[(len(dims) - value.ndim) * (np.newaxis,) +
(Ellipsis,)]
value = np.moveaxis(value, new_order, range(len(new_order)))
Expand Down
85 changes: 85 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,91 @@ def test_setitem(self):
expected[t] = 1
self.assertArrayEqual(orig.values, expected)

def test_setitem_fancy(self):
# vectorized indexing
da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind = Variable(['a'], [0, 1])
da[dict(x=ind, y=ind)] = 0
expected = DataArray([[0, 1], [1, 0], [1, 1]], dims=['x', 'y'])
self.assertDataArrayIdentical(expected, da)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future (no need to change this time), you can use assert_identical, from the newer testing tools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget this, maybe because I usually make tests based on the existing ones...
(I think it is what many contributors do).
I think we should update entire test cases at some point (similar to #1741), rather than gradual migration.

# assign another 0d-variable
da[dict(x=ind, y=ind)] = Variable((), 0)
expected = DataArray([[0, 1], [1, 0], [1, 1]], dims=['x', 'y'])
self.assertDataArrayIdentical(expected, da)
# assign another 1d-variable
da[dict(x=ind, y=ind)] = Variable(['a'], [2, 3])
expected = DataArray([[2, 1], [1, 3], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

# 2d-vectorized indexing
da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind_x = DataArray([[0, 1]], dims=['a', 'b'])
ind_y = DataArray([[1, 0]], dims=['a', 'b'])
da[dict(x=ind_x, y=ind_y)] = 0
expected = DataArray([[1, 0], [0, 1], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

da = DataArray(np.ones((3, 2)), dims=['x', 'y'])
ind = Variable(['a'], [0, 1])
da[ind] = 0
expected = DataArray([[0, 0], [0, 0], [1, 1]], dims=['x', 'y'])
self.assertVariableIdentical(expected, da)

def test_setitem_dataarray(self):
def get_data():
return DataArray(np.ones((4, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': np.arange(4), 'y': ['a', 'b', 'c'],
'non-dim': ('x', [1, 3, 4, 2])})

da = get_data()
# indexer with inconsistent coordinates.
ind = DataArray(np.arange(1, 4), dims=['x'],
coords={'x': np.random.randn(3)})
with raises_regex(IndexError, "dimension coordinate 'x'"):
da[dict(x=ind)] = 0

# indexer with consistent coordinates.
ind = DataArray(np.arange(1, 4), dims=['x'],
coords={'x': np.arange(1, 4)})
da[dict(x=ind)] = 0 # should not raise
assert np.allclose(da[dict(x=ind)].values, 0)
self.assertDataArrayIdentical(da['x'], get_data()['x'])
self.assertDataArrayIdentical(da['non-dim'], get_data()['non-dim'])

da = get_data()
# conflict in the assigning values
value = xr.DataArray(np.zeros((3, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': [0, 1, 2],
'non-dim': ('x', [0, 2, 4])})
with raises_regex(IndexError, "dimension coordinate 'x'"):
da[dict(x=ind)] = value

# consistent coordinate in the assigning values
value = xr.DataArray(np.zeros((3, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': [1, 2, 3],
'non-dim': ('x', [0, 2, 4])})
da[dict(x=ind)] = value
assert np.allclose(da[dict(x=ind)].values, 0)
self.assertDataArrayIdentical(da['x'], get_data()['x'])
self.assertDataArrayIdentical(da['non-dim'], get_data()['non-dim'])

# Conflict in the non-dimension coordinate
value = xr.DataArray(np.zeros((3, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': [1, 2, 3],
'non-dim': ('x', [0, 2, 4])})
# conflict in the assigning values
value = xr.DataArray(np.zeros((3, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': [0, 1, 2],
'non-dim': ('x', [0, 2, 4])})
with raises_regex(IndexError, "dimension coordinate 'x'"):
da[dict(x=ind)] = value

# consistent coordinate in the assigning values
value = xr.DataArray(np.zeros((3, 3, 2)), dims=['x', 'y', 'z'],
coords={'x': [1, 2, 3],
'non-dim': ('x', [0, 2, 4])})
da[dict(x=ind)] = value # should not raise

def test_contains(self):
data_array = DataArray(1, coords={'x': 2})
with pytest.warns(FutureWarning):
Expand Down
15 changes: 6 additions & 9 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1001,15 +1001,12 @@ def test_isel_dataarray(self):
# Conflict in the dimension coordinate
indexing_da = DataArray(np.arange(1, 4), dims=['dim2'],
coords={'dim2': np.random.randn(3)})
with raises_regex(
IndexError, "dimension coordinate 'dim2'"):
with raises_regex(IndexError, "dimension coordinate 'dim2'"):
actual = data.isel(dim2=indexing_da)
# Also the case for DataArray
with raises_regex(
IndexError, "dimension coordinate 'dim2'"):
with raises_regex(IndexError, "dimension coordinate 'dim2'"):
actual = data['var2'].isel(dim2=indexing_da)
with raises_regex(
IndexError, "dimension coordinate 'dim2'"):
with raises_regex(IndexError, "dimension coordinate 'dim2'"):
data['dim2'].isel(dim2=indexing_da)

# same name coordinate which does not conflict
Expand Down Expand Up @@ -1502,7 +1499,7 @@ def test_reindex_like(self):

expected = data.copy(deep=True)
expected['dim3'] = ('dim3', list('cdefghijkl'))
expected['var3'][:-2] = expected['var3'][2:]
expected['var3'][:-2] = expected['var3'][2:].values
expected['var3'][-2:] = np.nan
expected['letters'] = expected['letters'].astype(object)
expected['letters'][-2:] = np.nan
Expand Down Expand Up @@ -1614,9 +1611,9 @@ def test_align(self):
left = create_test_data()
right = left.copy(deep=True)
right['dim3'] = ('dim3', list('cdefghijkl'))
right['var3'][:-2] = right['var3'][2:]
right['var3'][:-2] = right['var3'][2:].values
right['var3'][-2:] = np.random.randn(*right['var3'][-2:].shape)
right['numbers'][:-2] = right['numbers'][2:]
right['numbers'][:-2] = right['numbers'][2:].values
right['numbers'][-2:] = -10

intersection = list('cdefghij')
Expand Down
50 changes: 50 additions & 0 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -1440,6 +1440,56 @@ def test_setitem(self):
v[dict(x=[True, False], y=[False, True, False])] = 1
self.assertTrue(v[0, 1] == 1)

def test_setitem_fancy(self):
# assignment which should work as np.ndarray does
def assert_assigned_2d(array, key_x, key_y, values):
expected = array.copy()
expected[key_x, key_y] = values
v = Variable(['x', 'y'], array)
v[dict(x=key_x, y=key_y)] = values
self.assertArrayEqual(expected, v)

# 1d vectorized indexing
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=0)
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=Variable((), 0))
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a'], [0, 1]),
key_y=Variable(['a'], [0, 1]),
values=Variable(('a'), [3, 2]))
assert_assigned_2d(np.random.randn(4, 3),
key_x=slice(None),
key_y=Variable(['a'], [0, 1]),
values=Variable(('a'), [3, 2]))

# 2d-vectorized indexing
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a', 'b'], [[0, 1]]),
key_y=Variable(['a', 'b'], [[1, 0]]),
values=0)
assert_assigned_2d(np.random.randn(4, 3),
key_x=Variable(['a', 'b'], [[0, 1]]),
key_y=Variable(['a', 'b'], [[1, 0]]),
values=[0])
assert_assigned_2d(np.random.randn(5, 4),
key_x=Variable(['a', 'b'], [[0, 1], [2, 3]]),
key_y=Variable(['a', 'b'], [[1, 0], [3, 3]]),
values=[2, 3])

# vindex with slice
v = Variable(['x', 'y', 'z'], np.ones((4, 3, 2)))
ind = Variable(['a'], [0, 1])
v[dict(x=ind, z=ind)] = 0
expected = Variable(['x', 'y', 'z'], np.ones((4, 3, 2)))
expected[0, :, 0] = 0
expected[1, :, 1] = 0
self.assertVariableIdentical(expected, v)

# dimension broadcast
v = Variable(['x', 'y'], np.ones((3, 2)))
ind = Variable(['a', 'b'], [[0, 1]])
Expand Down