KeyError for crosstab on Series with same name. #6319

theandygross · 2014-02-10T21:52:04Z

Doing a crosstab on two Series with the same name throws an error. This is due to a dictionary (indexed by the series name) in the crosstab function being used to store the data. Not sure if this is a feature or a bug, but a default similar to the behavior when Series without name are compared would be desirable to me.

In [56]:

s1 = pd.Series([1,1,2,2,3,3], name='s')
s2 = pd.Series([1,1,1,2,2,2], name='s')

pd.crosstab(s1, s2)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-9d16b2abac9f> in <module>()
      2 s2 = pd.Series([1,1,1,2,2,2], name='s')
      3 
----> 4 pd.crosstab(s1, s2)

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/tools/pivot.pyc in crosstab(rows, cols, values, rownames, colnames, aggfunc, margins, dropna)
    368         df['__dummy__'] = 0
    369         table = df.pivot_table('__dummy__', rows=rownames, cols=colnames,
--> 370                                aggfunc=len, margins=margins, dropna=dropna)
    371         return table.fillna(0).astype(np.int64)
    372     else:

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/tools/pivot.pyc in pivot_table(data, values, rows, cols, aggfunc, fill_value, margins, dropna)
    108         to_unstack = [agged.index.names[i]
    109                       for i in range(len(rows), len(keys))]
--> 110         table = agged.unstack(to_unstack)
    111 
    112     if not dropna:

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in unstack(self, level)
   3339         """
   3340         from pandas.core.reshape import unstack
-> 3341         return unstack(self, level)
   3342 
   3343     #----------------------------------------------------------------------

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/core/reshape.pyc in unstack(obj, level)
    416 def unstack(obj, level):
    417     if isinstance(level, (tuple, list)):
--> 418         return _unstack_multiple(obj, level)
    419 
    420     if isinstance(obj, DataFrame):

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/core/reshape.pyc in _unstack_multiple(data, clocs)
    275     index = data.index
    276 
--> 277     clocs = [index._get_level_number(i) for i in clocs]
    278 
    279     rlocs = [i for i in range(index.nlevels) if i not in clocs]

/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas-0.13.0_247_g82bcbb8-py2.7-linux-x86_64.egg/pandas/core/index.pyc in _get_level_number(self, level)
   2197         except ValueError:
   2198             if not isinstance(level, int):
-> 2199                 raise KeyError('Level %s not found' % str(level))
   2200             elif level < 0:
   2201                 level += self.nlevels

KeyError: 'Level s not found'

jreback · 2014-03-22T21:36:51Z

I am not sure this is a bug, what would you expect this to do?

theandygross · 2014-03-24T02:23:12Z

I can't remember if the previous versions added to the column labels or just dropped them. It would look something like either:

s_col  1  2
s_row      
1      2  0
2      1  1
3      0  2

or

col_0  1  2
row_0      
1      2  0
2      1  1
3      0  2

johnhess · 2014-03-25T17:35:34Z

I'm having the exact same issue manifested in a slightly different way. In my case, I'm using DataFrame.pivot_table and specifying the same column as rows and as cols.

With a simple dataframe like this

>>> df=pd.DataFrame([[1,2],[3,4],[5,6]], columns=["a","b"])
>>> df
   a  b
0  1  2
1  3  4
2  5  6

When we pivot a x b we get

>>> df.pivot_table(rows="a", cols="b", aggfunc='count')["a"]
b   2   4   6
a            
1   1 NaN NaN
3 NaN   1 NaN
5 NaN NaN   1

[3 rows x 3 columns]

which makes perfect sense. Similarly, I would expect that if I pivoted a x a that I would get

>>> df.pivot_table(rows="a", cols="a", aggfunc='count')["a"]
a   1   3   5
a            
1   1 NaN NaN
3 NaN   1 NaN
5 NaN NaN   1

[3 rows x 3 columns]

TomAugspurger · 2014-03-28T16:42:26Z

@johnhess for your problem you can workaround with something like

In [18]: pd.DataFrame(np.diag(df.a), index=df.index, columns=df.index)
Out[18]: 
   0  1  2
0  1  0  0
1  0  3  0
2  0  0  5

[3 rows x 3 columns]

It looks like these are cause by the same issue: how unstack handles indices with duplicate names:

ipdb> agged
     __dummy__
s s           
1 1          3
2 3          3

I don't think that agged.unstack('s') is well defined here (which 's' gets put in the columns?). So that correctly raises an error (the error message could be clearer though).
But in the examples for this issue we already know that the s that came from the second argument to crosstab goes in the column (and for pivot_table the cols argument goes to the column).

@jreback any objection to having checks in crosstab and pivot_table to check for these special cases? I can do it this weekend.

jreback · 2014-03-28T17:15:49Z

@TomAugspurger that sounds reasonable. maybe a ValueError or something with an explanation. In theory could have a suffix argument but this seems a special case.

johnhess · 2014-03-28T18:04:13Z

@TomAugspurger Thanks for taking the time to look into it! I have another workaround in at the moment, so I'm safe, but I worry that others will expect pandas to crosstab any two valid series and end up with surprise errors. In my case, users of my application have an interface to crosstab any two columns and I hadn't realized I needed a special case when they have the same name.

hayd · 2014-03-28T18:14:21Z

👍 to a better error message, imo better to raise than infer here.

TomAugspurger · 2014-03-29T16:20:26Z

@hayd I addressed the error msg in #6738

I agree that df.unstack(level='level') and df.stack(level='level') should both raise when its ambiguous what level is because of duplicates names in the index.

However, there in the cases of pivot_table(rows='level', columns='level') and crosstab(s1, s1), there isn't any ambiguity. The first arg/rows arg goes in the rows and the second/cols arg goes in the columns. The current issue of raising is just the implementation not handling that case. Are you ok with inferring there?

jreback · 2014-03-29T16:22:37Z

is might make sense in pivot to collapse the index (just droplevel(1))
when row and column refer to a single name

jreback · 2014-04-21T18:42:37Z

@TomAugspurger for 0.14? or since you fixed the error can bump?

TomAugspurger · 2014-04-21T23:41:04Z

I'm not seeing a quick fix here. The current pivot_table implementation depends on not having any repeats in index or columns. So I guess bump for now unless I come up with a clever fix.

hayd · 2014-04-22T17:54:17Z

The current issue of raising is just the implementation not handling that case

can you raise a NotImplementedError for this part?

jreback · 2017-12-29T00:55:13Z

this appears fixed. if someone could locate the reference we can close.

mroeschke · 2018-01-01T00:16:46Z

Closed by #16028

kasuteru · 2018-07-06T11:26:56Z

This is not fixed for me, using pandas.version 0.23.1:

print(pd.__version__)
df = pd.DataFrame(data={"a":[1,2,3,4]})
a1 = df["a"]
a2 = df["a"]
pd.crosstab(a1,a2)

still raises an error for me.

Edit: Submitted as issue #21765

TomAugspurger · 2018-07-06T13:27:49Z

Could you open a new issue with that example and all your version info?

…

On Fri, Jul 6, 2018 at 6:27 AM, kasuteru ***@***.***> wrote: This is not fixed for me, using pandas.*version* 0.23.1: print(pd.__version__) df = pd.DataFrame(data={"a":[1,2,3,4]}) a1 = df["a"] a2 = df["a"] pd.crosstab(a1,a2) still raises an error for me. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6319 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIhYgHzArkEo384cxWymDsJoqvglwks5uD0mFgaJpZM4BgwcP> .

kasuteru · 2018-07-06T15:06:59Z

I submitted it as #21765. Turns out that the example provided here also fails, so I used that.

jreback added Bug labels Mar 25, 2014

jreback added this to the 0.15.0 milestone Mar 25, 2014

jreback modified the milestones: 0.14.0, 0.15.0 Mar 28, 2014

jreback mentioned this issue Apr 9, 2014

BUG/API Raise ValueError for non-unique stack by name #6738

Closed

TomAugspurger modified the milestones: 0.15.0, 0.14.0 Apr 21, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jreback modified the milestones: Next Major Release, 0.23.0 Dec 29, 2017

jreback closed this as completed Jan 1, 2018

kasuteru mentioned this issue Jul 6, 2018

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError for crosstab on Series with same name. #6319

KeyError for crosstab on Series with same name. #6319

theandygross commented Feb 10, 2014

jreback commented Mar 22, 2014

theandygross commented Mar 24, 2014

johnhess commented Mar 25, 2014

TomAugspurger commented Mar 28, 2014

jreback commented Mar 28, 2014

johnhess commented Mar 28, 2014

hayd commented Mar 28, 2014

TomAugspurger commented Mar 29, 2014

jreback commented Mar 29, 2014

jreback commented Apr 21, 2014

TomAugspurger commented Apr 21, 2014

hayd commented Apr 22, 2014

jreback commented Dec 29, 2017

mroeschke commented Jan 1, 2018

kasuteru commented Jul 6, 2018 •

edited

Loading

TomAugspurger commented Jul 6, 2018 via email

kasuteru commented Jul 6, 2018 •

edited

Loading

KeyError for crosstab on Series with same name. #6319

KeyError for crosstab on Series with same name. #6319

Comments

theandygross commented Feb 10, 2014

jreback commented Mar 22, 2014

theandygross commented Mar 24, 2014

johnhess commented Mar 25, 2014

TomAugspurger commented Mar 28, 2014

jreback commented Mar 28, 2014

johnhess commented Mar 28, 2014

hayd commented Mar 28, 2014

TomAugspurger commented Mar 29, 2014

jreback commented Mar 29, 2014

jreback commented Apr 21, 2014

TomAugspurger commented Apr 21, 2014

hayd commented Apr 22, 2014

jreback commented Dec 29, 2017

mroeschke commented Jan 1, 2018

kasuteru commented Jul 6, 2018 • edited Loading

TomAugspurger commented Jul 6, 2018 via email

kasuteru commented Jul 6, 2018 • edited Loading

kasuteru commented Jul 6, 2018 •

edited

Loading

kasuteru commented Jul 6, 2018 •

edited

Loading