Skip to content

Inconsistent csv behavior on Ubuntu #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artgoldberg opened this issue Jan 24, 2017 · 6 comments
Closed

Inconsistent csv behavior on Ubuntu #71

artgoldberg opened this issue Jan 24, 2017 · 6 comments

Comments

@artgoldberg
Copy link

artgoldberg commented Jan 24, 2017

Hi Folks

With this code

    import pyexcel, os
    fixture_file = 'bad-headers-Root.csv'
    filename = os.path.join(os.path.dirname(__file__), 'fixtures', fixture_file)
    sv_worksheet = pyexcel.get_sheet(file_name=filename)
    for sv_row in sv_worksheet.row:
        print('sv_row', list(sv_row))

and a file with this data:
Id,root
Name,€
,
,
x,

On OSX the program produces:

    $ python tests/schema/t.py
    sv_row ['Id', 'root']
    sv_row ['Name', '€']
    sv_row ['', '']
    sv_row ['', '']
    sv_row ['x', '']

Whereas on Ubuntu, it generates:

$ python tests/schema/t.py 
('sv_row', [u'Id', u'root'])
('sv_row', [u'Name', u'\u20ac'])
('sv_row', [u'x', ''])
$ uname -a
Linux box1042.localdomain 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Thanks
Arthur

@chfw
Copy link
Member

chfw commented Jan 24, 2017

That's related to version difference in relation to the behaviour of 'skip_empty_rows'. In later version, the skip_row_rows was enforced across all file_types and was originally limited to ods file type.

Please compare the installed version of pyexcel-io. I think you got pyexcel-io>=0.3.0 on ubuntu and got pyexcel-io>=0.2.0 but <=0.2.2 change log on Mac OS.

When you do the upgrade on OS X, please pay attention to pyexcel plugin compatibility table and pyexcel-io plugin table if you use other file types.

@artgoldberg
Copy link
Author

Thanks Jaska. You're right about different versions:

Ubuntu:
$ pip list| grep excel
pyexcel (0.4.2)
pyexcel-io (0.3.1)

OSX:
$ pip list| grep excel
pyexcel (0.3.3)
pyexcel-io (0.2.4)

I'll pay attention to compatibility. What is the reason for making skip_empty_rows default to true? That seems counter-intuitive to me.
Arthur

artgoldberg added a commit to KarrLab/wc_utils that referenced this issue Jan 24, 2017
@chfw
Copy link
Member

chfw commented Jan 24, 2017

I think the historical reason was that some individual ods file had extensive empty rows after the content of interest. That's why skip_empty_rows was invented. For uniformity, it was introduced across all file formats. As a side effect, it would affect empty rows in between two rows that has real content.

@artgoldberg
Copy link
Author

Thanks for your reply.
It also affects the row number of all data following the blank rows, which my unit tests caught.

In my opinion, the skip_empty_rows default should be false, since it's designed to handle an unusual case. Another possibility would be to issue a warning when empty rows are skipped, but programmers often ignore warnings. I know that I do.

However, my code doesn't depend on the default, as I'm setting skip_empty_rows=False in my only call to get_sheet.

I appreciate your responsive responses. Keep up your great open-source work!

@chfw
Copy link
Member

chfw commented Jan 26, 2017

pyexcel-io v0.3.2 was released and skip_empty_rows is defaulted to False.

@chfw chfw closed this as completed Jan 26, 2017
@artgoldberg
Copy link
Author

artgoldberg commented Jan 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants