BUG: indexing with boolean-like Index, #11119 #11178

preddy5 · 2015-09-23T17:29:37Z

preddy5 · 2015-09-23T17:59:52Z

max-sixty · 2015-09-23T20:35:14Z

Does this only work where you supply two items for a DataFrame, because it's checking for ndim=len(key)? Do you want to be checking the length of the axis?

In [11]: df2=pd.concat([df, pd.Series([1,3,12,9],name=True)],axis=1)

In [12]: df2
Out[12]: 
   False  True   True 
0      6      3      1
1      1      9      3
2     13      8     12
3      8      2      9


In [13]: df2[[False]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-b7f220d1c4ee> in <module>()
----> 1 df2[[False]]

/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in __getitem__(self, key)
   1906         if isinstance(key, (Series, np.ndarray, Index, list)):
   1907             # either boolean or fancy integer index
-> 1908             return self._getitem_array(key)
   1909         elif isinstance(key, DataFrame):
   1910             return self._getitem_frame(key)

/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in _getitem_array(self, key)
   1947                 else:
   1948                     raise ValueError('Item wrong length %d instead of %d.' %
-> 1949                                  (len(key), len(self.index)))
   1950             # check_bool_indexer will throw exception if Series key cannot
   1951             # be reindexed to match DataFrame rows

ValueError: Item wrong length 1 instead of 4.

In [17]: df2[[False, True, False]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-88caa367ce94> in <module>()
----> 1 df2[[False, True, False]]

/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in __getitem__(self, key)
   1906         if isinstance(key, (Series, np.ndarray, Index, list)):
   1907             # either boolean or fancy integer index
-> 1908             return self._getitem_array(key)
   1909         elif isinstance(key, DataFrame):
   1910             return self._getitem_frame(key)

/Users/maximilian/Dropbox/workspace/pandas/pandas/core/frame.py in _getitem_array(self, key)
   1947                 else:
   1948                     raise ValueError('Item wrong length %d instead of %d.' %
-> 1949                                  (len(key), len(self.index)))
   1950             # check_bool_indexer will throw exception if Series key cannot
   1951             # be reindexed to match DataFrame rows

ValueError: Item wrong length 3 instead of 4.

While if the column names are strings:

In [33]: df2_str[['False','True']]
Out[33]: 
   False  True  True
0      6     3     1
1      1     9     3
2     13     8    12
3      8     2     9

jreback · 2015-09-24T01:14:35Z

What actually is needed is to infer this:

In [22]: pd.lib.infer_dtype(Index([True,False]))
Out[22]: 'boolean'

so right now we don't have a separate boolean Index type, its just object dtype.

So this particular case only is relevant when you:

have a boolean indexer
have a boolean Index on the same axis (so it must be object dtype), then you infer

preddy5 · 2015-09-24T07:04:01Z

pandas/core/frame.py

@@ -1941,7 +1941,11 @@ def _getitem_array(self, key):
                warnings.warn("Boolean Series key will be reindexed to match "
                              "DataFrame index.", UserWarning)
            elif len(key) != len(self.index):
-                raise ValueError('Item wrong length %d instead of %d.' %
+                if lib.infer_dtype(self._info_axis)=="boolean":


@jreback am I doing it right ?

I think you can move this check to _convert_to_indexer, and raise if their is a problem (e.g. subsume the ValueError there). makes this code a lot simpler. right now too much special casing.

preddy5 · 2015-09-29T08:31:23Z

@jreback can you help me understand the error, I am unable to reproduce the error on my notebook and the error is only occurring for python2.7

jreback · 2015-09-29T09:53:28Z

its unrelated, a spurious failed. I restarted. In any event I will look at this prob next week. You are jumping thru lots of hoops here.

preddy5 · 2015-10-07T04:06:47Z

@jreback Could you review the PR. Thanks

shoyer · 2015-10-07T06:33:15Z

I'm not sure if this change is actually a good idea. Suppose we have a 2x2 DataFrame with columns and rows [True, False]:

In [3]: df = pd.DataFrame([[1, 2], [3, 4]], columns=[True, False], index=[True, False])

In [4]: df
Out[4]:
       True   False
True       1      2
False      3      4

What should df[[False, True]] return? Standard indexing with [] is a mess of fallback rules, so usually it's best to first think about what .loc and .iloc can handle.

Here, note that both .loc and .iloc claim to support boolean arrays as indexers. Given that indexes can include booleans, this is clearly an ambiguous case in our current rules. The current behavior may be a bug, but all the indexers seem to currently use booleans arrays for subsetting rows, not looking up index values:

In [5]: df[[True, False]]
Out[5]:
      True   False
True      1      2

In [9]: df.loc[[True, False]]
Out[9]:
      True   False
True      1      2

So my suggestion is that we should first consider deprecating using booleans that are not index values with .loc. Then, once this is entirely unambiguous, we can consider appropriate fallback rules for normal indexing.

jreback · 2015-10-07T10:34:14Z

deprecating boolean indexers for .loc is a non-starter as a) this would break all back-compat, b) this would make .loc completely non-intuitive and unusable with a boolean array, which is a core feature.

Supporting label indexing is currently a buggy edge case. I'll review soon.

shoyer · 2015-10-08T01:52:05Z

I agree that this would be a backwards incompatibility break, but a pretty smooth deprecation cycle would be possible. I disagree that it is unintuitive for .loc to not work with boolean arrays, because .loc is explicitly for labeled based indexing. We've simply gotten away with it because using booleans for labels is pretty rare.

jreback · 2015-10-08T02:39:28Z

@shoyer I don't see any good reason to not allow .loc to accept boolean arrays. This would cause way more issues than it solves. In fact it is quite common to do this. Not sure why you are pushing this way. We want more consistency and unification, not less.

jreback · 2015-11-23T00:14:53Z

@pradyu1993 rebase this on master pls.

this needs a fair amount more testing. (of other cases of boolean labels).

jreback · 2016-01-11T13:45:22Z

closing. if you want to update according to comments. pls reopen

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Sep 24, 2015

jreback changed the title ~~issue #11119~~ BUG: indexing with boolean-like Index, #11119 Sep 24, 2015

preddy5 reviewed Sep 24, 2015
View reviewed changes

jorisvandenbossche mentioned this pull request Sep 24, 2015

issue #11119 #11175

Closed

issue #11119

0045079

jreback closed this Jan 11, 2016

jorisvandenbossche added the Closed PR label Jan 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: indexing with boolean-like Index, #11119 #11178

BUG: indexing with boolean-like Index, #11119 #11178

Uh oh!

preddy5 commented Sep 23, 2015

Uh oh!

preddy5 commented Sep 23, 2015

Uh oh!

max-sixty commented Sep 23, 2015

Uh oh!

jreback commented Sep 24, 2015

Uh oh!

preddy5 Sep 24, 2015

Uh oh!

jreback Sep 24, 2015

Uh oh!

preddy5 commented Sep 29, 2015

Uh oh!

jreback commented Sep 29, 2015

Uh oh!

preddy5 commented Oct 7, 2015

Uh oh!

shoyer commented Oct 7, 2015

Uh oh!

jreback commented Oct 7, 2015

Uh oh!

shoyer commented Oct 8, 2015

Uh oh!

jreback commented Oct 8, 2015

Uh oh!

jreback commented Nov 23, 2015

Uh oh!

jreback commented Jan 11, 2016

Uh oh!

Uh oh!

Uh oh!

BUG: indexing with boolean-like Index, #11119 #11178

BUG: indexing with boolean-like Index, #11119 #11178

Uh oh!

Conversation

preddy5 commented Sep 23, 2015

Uh oh!

preddy5 commented Sep 23, 2015

Uh oh!

max-sixty commented Sep 23, 2015

Uh oh!

jreback commented Sep 24, 2015

Uh oh!

preddy5 Sep 24, 2015

Choose a reason for hiding this comment

Uh oh!

jreback Sep 24, 2015

Choose a reason for hiding this comment

Uh oh!

preddy5 commented Sep 29, 2015

Uh oh!

jreback commented Sep 29, 2015

Uh oh!

preddy5 commented Oct 7, 2015

Uh oh!

shoyer commented Oct 7, 2015

Uh oh!

jreback commented Oct 7, 2015

Uh oh!

shoyer commented Oct 8, 2015

Uh oh!

jreback commented Oct 8, 2015

Uh oh!

jreback commented Nov 23, 2015

Uh oh!

jreback commented Jan 11, 2016

Uh oh!

Uh oh!