pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

BLMeltdown · 2017-06-02T10:22:11Z

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html#pandas.read_excel

It's about the parameter index_col.

From the docs, I read

index_col : int, list of ints, default None

    Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. If a list is passed, those columns will be combined into a MultiIndex"

I use the function with args={index_col=3, parse_cols="D-J"} (so it should parse from column 3 to column 9 included, 0-indexed). And I expected D to be used as Index, but actually it was column G.
I am not sure if it is the expected behavior, if then, maybe argument order should be changed to show that index_col depends on parsed_cols.

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-06-02T13:16:50Z

Can you make a reproducible example? I tried this out (see below) and it seems to be working ok?

df = pd.DataFrame(columns=list('ABCDEFGHIJKL'), data=np.zeros((100, 12)))

df.to_excel('tmp.xlsx', index=False)

pd.read_excel('tmp.xlsx').head()
Out[6]: 
   A  B  C  D  E  F  G  H  I  J  K  L
0  0  0  0  0  0  0  0  0  0  0  0  0
1  0  0  0  0  0  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  0  0  0  0  0  0  0  0
4  0  0  0  0  0  0  0  0  0  0  0  0

In [9]: pd.read_excel('tmp.xlsx', parse_cols='D:J', index_col=0).head()
Out[9]: 
   E  F  G  H  I  J
D                  
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0

BLMeltdown · 2017-06-02T16:51:52Z

Well, thanks, That's what I meant, actually. I expected the index_col int to match some column in the original document, as it is not precised. Maybe I missed something in the docs, though, about this kind of parsing things. So it is expected to work like this? Sorry for disturbing, then.

chris-b1 · 2017-06-02T17:00:09Z

Oh I see, I misread that. Yeah, in general it seems like we make index_col relative to the subset of data that is parsed, e.g. same behavior for read_csv. I do see why you might expect otherwise and I don't think it's documented, so I'll mark for that.

from io import StringIO
pd.read_csv(StringIO("""a,b,c,d
1,2,3,4
4,5,6,7"""), usecols=['b','c','d'], index_col=0)
Out[52]: 
   c  d
b      
2  3  4
5  6  7

rosygupta · 2017-06-06T14:36:25Z

@chris-b1 Can you help me figure out where to get started with fixing this issue? I understand the issue, just need the code guidance a bit.
Many Thanks!

chris-b1 · 2017-06-06T14:45:22Z

I'd expand the explanation of index_col in the read_excel docstring here

pandas/pandas/io/excel.py

Line 85 in 697d026

index_col : int, list of ints, default None

And them maybe add an example to the narrative docs, somewhere around here
http://pandas.pydata.org/pandas-docs/stable/io.html#parsing-specific-columns

(by changing this file):
https://github.com/pandas-dev/pandas/blob/697d0269aa6903be75db97afb45f9976abd2406d/doc/source/io.rst

rosygupta · 2017-06-06T14:49:17Z

@chris-b1 So we need not make changes in the code to allow the exception?

chris-b1 · 2017-06-06T14:52:41Z

Right, the current behavior is intended, this is only a documentation issue.

…

On Tue, Jun 6, 2017 at 9:49 AM, Rosy Gupta ***@***.***> wrote: @chris-b1 <https://github.com/chris-b1> So we need not make changes in the code to allow the exception? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16582 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1b_EVMy7uaY4NRVXDX7-tF7bYHDFGiks5sBWcEgaJpZM4NuGX8> .

rosygupta · 2017-06-06T17:12:37Z

@chris-b1 I shall work on it. But I'm more interested in some coding issue. Could you suggest some for starters?

chris-b1 added Difficulty Novice Docs IO Excel read_excel, to_excel labels Jun 2, 2017

chris-b1 mentioned this issue Jun 10, 2017

DOC: improve read_excel docstring #16658

Merged

jreback added this to the 0.20.3 milestone Jun 10, 2017

jreback closed this as completed in #16658 Jun 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

BLMeltdown commented Jun 2, 2017 •

edited by TomAugspurger

Loading

chris-b1 commented Jun 2, 2017

BLMeltdown commented Jun 2, 2017

chris-b1 commented Jun 2, 2017

rosygupta commented Jun 6, 2017 •

edited

Loading

chris-b1 commented Jun 6, 2017

rosygupta commented Jun 6, 2017

chris-b1 commented Jun 6, 2017 via email

rosygupta commented Jun 6, 2017

pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

Comments

BLMeltdown commented Jun 2, 2017 • edited by TomAugspurger Loading

chris-b1 commented Jun 2, 2017

BLMeltdown commented Jun 2, 2017

chris-b1 commented Jun 2, 2017

rosygupta commented Jun 6, 2017 • edited Loading

chris-b1 commented Jun 6, 2017

rosygupta commented Jun 6, 2017

chris-b1 commented Jun 6, 2017 via email

rosygupta commented Jun 6, 2017

BLMeltdown commented Jun 2, 2017 •

edited by TomAugspurger

Loading

rosygupta commented Jun 6, 2017 •

edited

Loading