-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: validate Index data is 1D + deprecate multi-dim indexing #30588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8e23296
bad608d
181457a
2898896
19d4347
e265c15
c14f01a
cdc9fda
ea67b33
4cc1c27
f7f1061
1e6a3f5
e488139
e02dd13
d163da8
74dffbe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -500,8 +500,11 @@ def __getitem__(self, value): | |
|
||
# scalar | ||
if not isinstance(left, ABCIndexClass): | ||
if isna(left): | ||
if is_scalar(left) and isna(left): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intention with this block of code was to only handle scalars here. Since With this PR it looks like Something like: left = self.left[value]
right = self.right[value]
# TODO: remove this block when Index.__getitem__ returning ndim > 1 is deprecated
if np.ndim(left) > 1:
# GH#30588 multi-dimensional indexer disallowed
raise ValueError(...)
# scalar
if not isinstance(left, ABCIndexClass):
.... Makes the logic less nested and easier to remove when we go through with the deprecation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated so that this raises ValueError for >1dim indexing on IntervalArray |
||
return self._fill_value | ||
if np.ndim(left) > 1: | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# GH#30588 multi-dimensional indexer disallowed | ||
raise ValueError("multi-dimensional indexing not allowed") | ||
return Interval(left, right, self.closed) | ||
|
||
return self._shallow_copy(left, right) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -975,3 +975,9 @@ def test_engine_type(self, dtype, engine_type): | |
ci.values._codes = ci.values._codes.astype("int64") | ||
assert np.issubdtype(ci.codes.dtype, dtype) | ||
assert isinstance(ci._engine, engine_type) | ||
|
||
def test_getitem_2d_deprecated(self): | ||
# GH#30588 multi-dim indexing is deprecated, but raising is also acceptable | ||
idx = self.create_index() | ||
with pytest.raises(ValueError, match="cannot mask with array containing NA"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shouldn't test wrong error messages ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may be a bug in how we're handling tuples. We convert the tuple array([slice(None, None, None), None], dtype=object) which gets interpreteded as a boolean mask with missing values, and so raise. I'm going to update the test to have a 2d ndarray as the indexer, nad open an issue for the bug. |
||
idx[:, None] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,3 +79,10 @@ def test_where(self, closed, klass): | |
expected = IntervalIndex([np.nan] + idx[1:].tolist()) | ||
result = idx.where(klass(cond)) | ||
tm.assert_index_equal(result, expected) | ||
|
||
def test_getitem_2d_deprecated(self): | ||
# GH#30588 multi-dim indexing is deprecated, but raising is also acceptable | ||
idx = self.create_index() | ||
with pytest.raises(ValueError, match="multi-dimensional indexing not allowed"): | ||
with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False): | ||
idx[:, None] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this only raise the error? (while trying out this branch, for me it also does) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep in mind, it's a deprecation warning and may not be visible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, that's it In [11]: warnings.simplefilter("always", DeprecationWarning)
In [12]: idx[np.array([[0, 1]])]
/Users/taugspurger/sandbox/pandas/pandas/core/arrays/interval.py:498: DeprecationWarning: Support for multi-dimensional indexing (e.g. `index[:, None]`) on an Index is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead.
left = self.left[value]
/Users/taugspurger/sandbox/pandas/pandas/core/arrays/interval.py:499: DeprecationWarning: Support for multi-dimensional indexing (e.g. `index[:, None]`) on an Index is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead.
right = self.right[value]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-bd6337c200ee> in <module>
----> 1 idx[np.array([[0, 1]])]
~/sandbox/pandas/pandas/core/indexes/interval.py in __getitem__(self, value)
990
991 def __getitem__(self, value):
--> 992 result = self._data[value]
993 if isinstance(result, IntervalArray):
994 return self._shallow_copy(result)
~/sandbox/pandas/pandas/core/arrays/interval.py in __getitem__(self, value)
505 if np.ndim(left) > 1:
506 # GH#30588 multi-dimensional indexer disallowed
--> 507 raise ValueError("multi-dimensional indexing not allowed")
508 return Interval(left, right, self.closed)
509
ValueError: multi-dimensional indexing not allowed I think that's OK for now, but can fix up later if desired. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,12 +83,9 @@ def test_getitem_ndarray_3d(self, index, obj, idxr, idxr_id): | |
msg = ( | ||
r"Buffer has wrong number of dimensions \(expected 1," | ||
r" got 3\)|" | ||
"The truth value of an array with more than one element is " | ||
"ambiguous|" | ||
"Cannot index with multidimensional key|" | ||
r"Wrong number of dimensions. values.ndim != ndim \[3 != 1\]|" | ||
"No matching signature found|" # TypeError | ||
"unhashable type: 'numpy.ndarray'" # TypeError | ||
"Index data must be 1-dimensional" | ||
) | ||
|
||
if ( | ||
|
@@ -104,21 +101,12 @@ def test_getitem_ndarray_3d(self, index, obj, idxr, idxr_id): | |
"categorical", | ||
] | ||
): | ||
idxr[nd3] | ||
else: | ||
if ( | ||
isinstance(obj, DataFrame) | ||
and idxr_id == "getitem" | ||
and index.inferred_type == "boolean" | ||
): | ||
error = TypeError | ||
elif idxr_id == "getitem" and index.inferred_type == "interval": | ||
error = TypeError | ||
else: | ||
error = ValueError | ||
|
||
with pytest.raises(error, match=msg): | ||
with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False): | ||
idxr[nd3] | ||
else: | ||
with pytest.raises(ValueError, match=msg): | ||
with tm.assert_produces_warning(DeprecationWarning): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here, is it needed to have both contexts? |
||
idxr[nd3] | ||
|
||
@pytest.mark.parametrize( | ||
"index", tm.all_index_generator(5), ids=lambda x: type(x).__name__ | ||
|
@@ -146,16 +134,14 @@ def test_setitem_ndarray_3d(self, index, obj, idxr, idxr_id): | |
nd3 = np.random.randint(5, size=(2, 2, 2)) | ||
|
||
msg = ( | ||
r"Buffer has wrong number of dimensions \(expected 1, " | ||
r"got 3\)|" | ||
"The truth value of an array with more than one element is " | ||
"ambiguous|" | ||
"Only 1-dimensional input arrays are supported|" | ||
r"Buffer has wrong number of dimensions \(expected 1," | ||
r" got 3\)|" | ||
"'pandas._libs.interval.IntervalTree' object has no attribute " | ||
"'set_value'|" # AttributeError | ||
"unhashable type: 'numpy.ndarray'|" # TypeError | ||
"No matching signature found|" # TypeError | ||
r"^\[\[\[" # pandas.core.indexing.IndexingError | ||
r"^\[\[\[|" # pandas.core.indexing.IndexingError | ||
"Index data must be 1-dimensional" | ||
) | ||
|
||
if (idxr_id == "iloc") or ( | ||
|
@@ -176,10 +162,8 @@ def test_setitem_ndarray_3d(self, index, obj, idxr, idxr_id): | |
): | ||
idxr[nd3] = 0 | ||
else: | ||
with pytest.raises( | ||
(ValueError, AttributeError, TypeError, pd.core.indexing.IndexingError), | ||
match=msg, | ||
): | ||
err = (ValueError, AttributeError) | ||
with pytest.raises(err, match=msg): | ||
idxr[nd3] = 0 | ||
|
||
def test_inf_upcast(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,11 +66,10 @@ def test_registering_no_warning(self): | |
|
||
# Set to the "warn" state, in case this isn't the first test run | ||
register_matplotlib_converters() | ||
with tm.assert_produces_warning(None) as w: | ||
with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False): | ||
# GH#30588 DeprecationWarning from 2D indexing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we shouldn't assert here, but just ignore warnings. When matplotlib fixes this (hopefully they will do soon), this test would otherwise fail (can be fixed later) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Working on a fix on the mpl side now. |
||
ax.plot(s.index, s.values) | ||
|
||
assert len(w) == 0 | ||
|
||
def test_pandas_plots_register(self): | ||
pytest.importorskip("matplotlib.pyplot") | ||
s = Series(range(12), index=date_range("2017", periods=12)) | ||
|
@@ -101,19 +100,16 @@ def test_option_no_warning(self): | |
|
||
# Test without registering first, no warning | ||
with ctx: | ||
with tm.assert_produces_warning(None) as w: | ||
# GH#30588 DeprecationWarning from 2D indexing on Index | ||
with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False): | ||
ax.plot(s.index, s.values) | ||
|
||
assert len(w) == 0 | ||
|
||
# Now test with registering | ||
register_matplotlib_converters() | ||
with ctx: | ||
with tm.assert_produces_warning(None) as w: | ||
with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False): | ||
ax.plot(s.index, s.values) | ||
|
||
assert len(w) == 0 | ||
|
||
def test_registry_resets(self): | ||
units = pytest.importorskip("matplotlib.units") | ||
dates = pytest.importorskip("matplotlib.dates") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also raise a warning?
(but adding the warning is not a blocker for the RC, since that's a future deprecation, as long as the behaviour change is already there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we are returning plain integer codes here, which doesn't look good.
And checked with 0.25, and there multi-dim indexing on categorical/categoricalindex basically fails (it "returns", but whenever you do something it gives an error; even converting to the result to a numpy array (the original matplotlib use case for the 2d indexing) fails).
So I would also raise an error here, like you did for some other arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this should raise to match Interval. Will push an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not straightforward, as actually this already gets catched before in the boolean checking ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I would be fine with doing a clean-up for my comments in a follow-up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a bit tricky. Will look a bit longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this raised an error in 0.25, it raises a different (but wrong) error now, so I think fine to do in a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused by what's going on with with
Series.__getitem__
here. Are we OK with changing this for categorical after the RC?