-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file descriptors are not closed #14
Comments
the close() function should be called for read-only and write-only mode, which were used in this library. will run your code to test a fix. |
I've been trying this, but it's not enough: pyexcel_xlsx/xlsx.py | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git i/pyexcel_xlsx/xlsx.py w/pyexcel_xlsx/xlsx.py
index ed01c1b..bf8a036 100644
--- i/pyexcel_xlsx/xlsx.py
+++ w/pyexcel_xlsx/xlsx.py
@@ -101,7 +101,7 @@ class XLSXBook(BookReader):
def read_sheet(self, native_sheet):
sheet = XLSXSheet(native_sheet, **self._keywords)
- return {sheet.name: sheet.to_array()}
+ return {sheet.name: list(sheet.to_array())}
def _load_the_excel_file(self, file_alike_object):
self._native_book = openpyxl.load_workbook(
@@ -111,6 +111,9 @@ class XLSXBook(BookReader):
self.skip_hidden_sheets = self._keywords.get(
'skip_hidden_sheets', True)
+ def close(self):
+ self._native_book.close()
+ self._native_book = None
class XLSXSheetWriter(SheetWriter):
""" There's an issue in openpyxl, it's closing its ZipFile archive, but internally, ZipFile tracks reference counts, and the descriptor is not closed because there's still at least one reference. |
This additional patch in openpyxl is necessary: openpyxl/worksheet/read_only.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git i/openpyxl/worksheet/read_only.py w/openpyxl/worksheet/read_only.py
index fd533569..93438328 100644
--- i/openpyxl/worksheet/read_only.py
+++ w/openpyxl/worksheet/read_only.py
@@ -89,7 +89,8 @@ class ReadOnlyWorksheet(object):
def xml_source(self):
"""Parse xml source on demand, default to Excel archive"""
if self._xml is None:
- return self.parent._archive.open(self.worksheet_path)
+ from six import BytesIO
+ return BytesIO(self.parent._archive.read(self.worksheet_path))
return self._xml Now to see if I clean it up to make a proper PR... |
The fix seems to be complex. My thought was to allow the developer to consume the data on demand so as to avoid reading the data into memory on behalf of the developer. "streaming=True" would enable on-demand feature. For this feature, the file handle should be kept open and eventually will be closed by garbage collector. As in your changes, you have noticed that close method alone would cause a RuntimeError: Attempt to read ZIP archive that was already closed. So to cater the need to close the file handle manually, I would have to think of some alternatives. |
I certainly don't expect |
yes, you had your point there. the fix would be make those file handle closure explicit. with get_* functions, auto closure is expected whereas iget_*, the closure will be left to the developer after the generator has been consumed. |
OK, for now I'll be using a patched version of |
I will put this fix in 0.4.0 branch, next major release. In this way, I could focus the effort on one branch instead of supporting two branches. |
Inconsistent behaviour was found in this test run: #. file handle from iget_data was closed in python 3.6, pypy only but not with other python versions openpyxl version is v2.4.8. |
For example, when running the following code on Linux:
The last file descriptor open point to
test.xlsx
.Beside the file descriptor leak, this is really problematic in Windows, as it make it impossible to concurrently modify the spreadsheet in Office when it has been read in another (still running) application.
The text was updated successfully, but these errors were encountered: