Skip to content

pandas.read_excel index_col argument behavior seems to be wrong (or unexpected, as by the docs) #16582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BLMeltdown opened this issue Jun 2, 2017 · 8 comments · Fixed by #16658
Labels
Docs IO Excel read_excel, to_excel
Milestone

Comments

@BLMeltdown
Copy link

BLMeltdown commented Jun 2, 2017

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html#pandas.read_excel

It's about the parameter index_col.

From the docs, I read

index_col : int, list of ints, default None

    Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. If a list is passed, those columns will be combined into a MultiIndex"

I use the function with args={index_col=3, parse_cols="D-J"} (so it should parse from column 3 to column 9 included, 0-indexed). And I expected D to be used as Index, but actually it was column G.
I am not sure if it is the expected behavior, if then, maybe argument order should be changed to show that index_col depends on parsed_cols.

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 2, 2017

Can you make a reproducible example? I tried this out (see below) and it seems to be working ok?

df = pd.DataFrame(columns=list('ABCDEFGHIJKL'), data=np.zeros((100, 12)))

df.to_excel('tmp.xlsx', index=False)

pd.read_excel('tmp.xlsx').head()
Out[6]: 
   A  B  C  D  E  F  G  H  I  J  K  L
0  0  0  0  0  0  0  0  0  0  0  0  0
1  0  0  0  0  0  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  0  0  0  0  0  0  0  0
4  0  0  0  0  0  0  0  0  0  0  0  0

In [9]: pd.read_excel('tmp.xlsx', parse_cols='D:J', index_col=0).head()
Out[9]: 
   E  F  G  H  I  J
D                  
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0

@BLMeltdown
Copy link
Author

Well, thanks, That's what I meant, actually. I expected the index_col int to match some column in the original document, as it is not precised. Maybe I missed something in the docs, though, about this kind of parsing things. So it is expected to work like this? Sorry for disturbing, then.

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 2, 2017

Oh I see, I misread that. Yeah, in general it seems like we make index_col relative to the subset of data that is parsed, e.g. same behavior for read_csv. I do see why you might expect otherwise and I don't think it's documented, so I'll mark for that.

from io import StringIO
pd.read_csv(StringIO("""a,b,c,d
1,2,3,4
4,5,6,7"""), usecols=['b','c','d'], index_col=0)
Out[52]: 
   c  d
b      
2  3  4
5  6  7

@rosygupta
Copy link

rosygupta commented Jun 6, 2017

@chris-b1 Can you help me figure out where to get started with fixing this issue? I understand the issue, just need the code guidance a bit.
Many Thanks!

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 6, 2017

I'd expand the explanation of index_col in the read_excel docstring here

index_col : int, list of ints, default None

And them maybe add an example to the narrative docs, somewhere around here
http://pandas.pydata.org/pandas-docs/stable/io.html#parsing-specific-columns

(by changing this file):
https://github.com/pandas-dev/pandas/blob/697d0269aa6903be75db97afb45f9976abd2406d/doc/source/io.rst

@rosygupta
Copy link

@chris-b1 So we need not make changes in the code to allow the exception?

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 6, 2017 via email

@rosygupta
Copy link

@chris-b1 I shall work on it. But I'm more interested in some coding issue. Could you suggest some for starters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants