Skip to content

[CLN] Excel Module Cleanups #25275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 20, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,9 +590,8 @@ def __new__(cls, path, engine=None, **kwargs):
if engine == 'auto':
engine = _get_default_writer(ext)
except KeyError:
error = ValueError("No engine for filetype: '{ext}'"
.format(ext=ext))
raise error
raise ValueError("No engine for filetype: '{ext}'"
.format(ext=ext))
cls = get_writer(engine)

return object.__new__(cls)
Expand Down
11 changes: 6 additions & 5 deletions pandas/io/excel/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@

from pandas.core import config

_writer_extensions = ["xlsx", "xls", "xlsm"]
# the following extensions are already registered in pandas/core/config_init.py
_registered_writer_extensions = ["xlsx", "xls", "xlsm"]


_writers = {}
Expand All @@ -24,13 +25,15 @@ def register_writer(klass):
for ext in klass.supported_extensions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear this was never part of the public API hence why I think OK to remove. If a user in spite of that still registered a custom subclass with an extension outside of .xls, .xlsx or .xlsm then potentially it would have an impact, but that seems like such an edge case to me hence why I think OK to remove.

cc @jreback in any case

Copy link
Member

@gfyoung gfyoung Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I'm -0.5 on this. Discarding because it's an "edge case" does not seem like a strong argument for removing this code if we're in the business of robustness. Potentially there are better ways to implement this, but I wouldn't necessarily go as far as removing for now.

Copy link
Member

@WillAyd WillAyd Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is that it's not part of the public API and is a no-op with current internal code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is that it's not part of the public API and is a no-op with current internal code

True. I'm still a little on the fence on this though for the reason I gave earlier, so getting @jreback input would be helpful for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok i think

if ext.startswith('.'):
ext = ext[1:]
if ext not in _writer_extensions:
if ext not in _registered_writer_extensions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we even need this block or is this entirely a no-op? I don't think we have any reader support for things outside of the already registered extensions right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell it is a no-op, but I can't judge the implications of removing it. The tests pass without this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it simplifies code and doesn't impact functionality (which I believe to be the case) then I'd say just delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected, test_register_writer fails for some builds after the code removal, so I went ahead and removed the test too. Now I am a bit less sure if removing this is a good idea, I'll look into the history to see why this was added in the first place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the failure? On a quick look I think it's only failing because of the test extension but I can't imagine that being an actual use case anyway. Would probably just remove that one particular extension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the test was added in 42e06a7, with no specific usecase in mind, just to "Make ExcelWriter more pluggable"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I think your removal here is too aggressive. Should only need to remove the non-excel extension not the entire test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the test is about being able to register a custom excel writer with custom extensions. The PR as is now breaks the ability to register custom extensions, so I removed the test for that. I read a bit on excel extensions, apparently there is also the binary.xlsb extensions that excel supports natively. And contrary to what I expected, there is even some limited (read only) python support with the pyxlsb library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind these mechanisms are private. We don't currently support xlsb in any fashion (you can search issue tracker for requests) so the reasoning for removal of the registration here is arguably that we've over-engineered this feature internally. The test case for a .test extension is rather contrived

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the backgrounds. The test is now failing for Windows py27_np121. I don't understand the error, I'm guessing it's just flakiness. Any way to rerun the a specific job?

config.register_option("io.excel.{ext}.writer".format(ext=ext),
engine_name, validator=str)
_writer_extensions.append(ext)
_registered_writer_extensions.append(ext)


def _get_default_writer(ext):
"""Return the default writer per extension. This default engine is used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow pandas docstring conventions here? You can run the scripts/validate_docstrings.py script against this to get a list of errors

unles another engine is explicitly defined."""
_default_writers = {'xlsx': 'openpyxl', 'xlsm': 'openpyxl', 'xls': 'xlwt'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a module level variable, but maybe needs a followup for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this going either way, so agreed that a followup makes sense.

try:
import xlsxwriter # noqa
Expand Down Expand Up @@ -230,8 +233,6 @@ def _fill_mi_header(row, control_row):

return _maybe_convert_to_string(row), control_row

# fill blank if index_col not None


def _pop_header_name(row, index_col):
"""
Expand Down