-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: indexing with boolean-like Index, #11119 #11178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this only work where you supply two items for a DataFrame, because it's checking for In [11]: df2=pd.concat([df, pd.Series([1,3,12,9],name=True)],axis=1)
In [12]: df2
Out[12]:
False True True
0 6 3 1
1 1 9 3
2 13 8 12
3 8 2 9
In [13]: df2[[False]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-b7f220d1c4ee> in <module>()
----> 1 df2[[False]]
/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in __getitem__(self, key)
1906 if isinstance(key, (Series, np.ndarray, Index, list)):
1907 # either boolean or fancy integer index
-> 1908 return self._getitem_array(key)
1909 elif isinstance(key, DataFrame):
1910 return self._getitem_frame(key)
/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in _getitem_array(self, key)
1947 else:
1948 raise ValueError('Item wrong length %d instead of %d.' %
-> 1949 (len(key), len(self.index)))
1950 # check_bool_indexer will throw exception if Series key cannot
1951 # be reindexed to match DataFrame rows
ValueError: Item wrong length 1 instead of 4.
In [17]: df2[[False, True, False]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-88caa367ce94> in <module>()
----> 1 df2[[False, True, False]]
/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in __getitem__(self, key)
1906 if isinstance(key, (Series, np.ndarray, Index, list)):
1907 # either boolean or fancy integer index
-> 1908 return self._getitem_array(key)
1909 elif isinstance(key, DataFrame):
1910 return self._getitem_frame(key)
/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in _getitem_array(self, key)
1947 else:
1948 raise ValueError('Item wrong length %d instead of %d.' %
-> 1949 (len(key), len(self.index)))
1950 # check_bool_indexer will throw exception if Series key cannot
1951 # be reindexed to match DataFrame rows
ValueError: Item wrong length 3 instead of 4. While if the column names are strings: In [33]: df2_str[['False','True']]
Out[33]:
False True True
0 6 3 1
1 1 9 3
2 13 8 12
3 8 2 9 |
What actually is needed is to infer this:
so right now we don't have a separate boolean So this particular case only is relevant when you:
|
@@ -1941,7 +1941,11 @@ def _getitem_array(self, key): | |||
warnings.warn("Boolean Series key will be reindexed to match " | |||
"DataFrame index.", UserWarning) | |||
elif len(key) != len(self.index): | |||
raise ValueError('Item wrong length %d instead of %d.' % | |||
if lib.infer_dtype(self._info_axis)=="boolean": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback am I doing it right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move this check to _convert_to_indexer
, and raise if their is a problem (e.g. subsume the ValueError
there). makes this code a lot simpler. right now too much special casing.
@jreback can you help me understand the error, I am unable to reproduce the error on my notebook and the error is only occurring for python2.7 |
its unrelated, a spurious failed. I restarted. In any event I will look at this prob next week. You are jumping thru lots of hoops here. |
@jreback Could you review the PR. Thanks |
I'm not sure if this change is actually a good idea. Suppose we have a 2x2 DataFrame with columns and rows
What should Here, note that both
So my suggestion is that we should first consider deprecating using booleans that are not index values with |
deprecating boolean indexers for Supporting label indexing is currently a buggy edge case. I'll review soon. |
I agree that this would be a backwards incompatibility break, but a pretty smooth deprecation cycle would be possible. I disagree that it is unintuitive for |
@shoyer I don't see any good reason to not allow |
@pradyu1993 rebase this on master pls. this needs a fair amount more testing. (of other cases of boolean labels). |
closing. if you want to update according to comments. pls reopen |
closes #11119