-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Attempting to load empty / invalid files throws PHP notices/warnings #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also bumping into this issue—except at a slightly different angle. For whatever reason the xlsx file we've been handed has a slightly different file structure, where the _rels dir is at the base level instead of the in the xl dir where it should(?) be. Don't know enough about the spec yet, but this seems not to bother other xlsx readers. PHPSpreadsheet dies on it for the same reason it dies on an empty xlsx file—trying to do a foreach on null. vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php
The return of |
Have the same issue while reading just a regular xlsx file created with office 365 that has data in it, has anyone come up with the fix yet ? majority of the urls doesn't seem to be working anymore: Xlsx reader breaks with Relationship not a property error. Majority of urls used in Xlsx reader seem to be incorrect, for example instead http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument should be http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument... |
My original problem (which was caused by attempting to load a file which was actually not an Excel document) can easily be fixed by the following patch: diff --git a/src/PhpSpreadsheet/Reader/Xlsx.php b/src/PhpSpreadsheet/Reader/Xlsx.php
index f07ac008..85af39da 100644
--- a/src/PhpSpreadsheet/Reader/Xlsx.php
+++ b/src/PhpSpreadsheet/Reader/Xlsx.php
@@ -328,8 +328,12 @@ class Xlsx extends BaseReader
$unparsedLoadedData = [];
$zip = new ZipArchive();
- $zip->open($pFilename);
+ $result = $zip->open($pFilename);
+ if ($result !== true) {
+ // This does NOT catch the empty archive case - ZipArchive::open()
+ // returns true (with Deprecated notice on PHP >= 8.0)
+ throw new Exception('File "' . $pFilename . '" could not be opened (code ' . $result . ').');
+ }
// Read the theme first, because we need the colour scheme when reading the styles
//~ http://schemas.openxmlformats.org/package/2006/relationships" This does not handle the empty file case though, and I'm not sure about the right way to fix this. PHP documentation states that libzip >= 1.6.0 treats empty archive as invalid, but even though have libzip 1.7.3, ZipArchive::open('emptyfile') returns true (PHP >= 8.0 does throw a deprecated notice) so we can't rely on that. Should load() method throw an exception given an empty file, or return a valid, empty spreadsheet object ? I have no idea how files missing (some of) the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I'd forgotten about this. We had to adopt a much more annoying tool to read files. Would be nice to see this fixed. |
We have just committed a change to master which is the first phase of a project to improve namespace handling. I doubt that it will fix the empty file or not-a-zip problems; I will certainly add those requirements to the next phase. I think it should handle, at least in part, the different namespace problem reported by @php-force on April 26; please test if possible. It may or may not handle the _rels-in-the-wrong-place problem reported by @jonnylink on November 23. If it doesn't, or if you can't test against master, is it possible to upload a copy of that file so that I can analyze what is needed for it? |
I'll look for one of the files and report back next week. To be honest, I'd forgotten why we couldn't use PHPOffice for certain files until the stalebot reminded me 😬 |
This started as a response to issue PHPOffice#1718, for which it is a partial (not complete) solution. The following changes are made: - canRead can currently throw an exception. This seems wrong. It should just return true/false. - Breaking change of sorts. When AssertFile encounters a non-existent or unreadable file, it throws InvalidArgumentException. This does not make sense. I have changed it to throw PhpSpreadsheet/Reader/Exception. - Since the previous bullet item required changing of most of the Reader files anyhow, this is a good time to add explicit typing for canRead in the function signature rather than the DocBlock. Since all the canRead functions inherit from an abstract version in IReader, they all have to be changed simulatneously. Except for Xlsx and Ods, most of the Reader files are otherwise unchanged. - AssertFile is changed to add an optional "zip member" parameter. It will check for the existence of an appropriate member in what is supposed to be a zip file. It is used by Xlsx and Ods. - Verifying that a given file is a valid zip ought to be a feature of ZipArchive. Thanks to a particularly nasty bug in php/libzip (see https://bugs.php.net/bug.php?id=81222), it is unsafe to attempt to open a zero-length file as a zip archive. There is a solution, but it does not apply to all the PHP releases which we support, and isn't even necessarily supported on all the point versions of the PHP versions which we do support. I have coded up a manual test for "valid zip", with a comment pointing to the spec. - In theory, tests now cover 100% of the code in Shared/File. In practice ... One of the tests require that chmod works properly, which is not quite true on Windows systems, so that test is skipped on Windows. Another test requires that php.ini uses a non-default value for upload_temp_dir (can't be overridden in application code), which is probably not the case when Github runs the unit tests, so that test is skipped when appropriate. I have run tests for both on systems where they are not skipped.
* Tweaks to Input File Validation This started as a response to issue #1718, for which it is a partial (not complete) solution. The following changes are made: - canRead can currently throw an exception. This seems wrong. It should just return true/false. - Breaking change of sorts. When AssertFile encounters a non-existent or unreadable file, it throws InvalidArgumentException. This does not make sense. I have changed it to throw PhpSpreadsheet/Reader/Exception. - Since the previous bullet item required changing of most of the Reader files anyhow, this is a good time to add explicit typing for canRead in the function signature rather than the DocBlock. Since all the canRead functions inherit from an abstract version in IReader, they all have to be changed simulatneously. Except for Xlsx and Ods, most of the Reader files are otherwise unchanged. - AssertFile is changed to add an optional "zip member" parameter. It will check for the existence of an appropriate member in what is supposed to be a zip file. It is used by Xlsx and Ods. - Verifying that a given file is a valid zip ought to be a feature of ZipArchive. Thanks to a particularly nasty bug in php/libzip (see https://bugs.php.net/bug.php?id=81222), it is unsafe to attempt to open a zero-length file as a zip archive. There is a solution, but it does not apply to all the PHP releases which we support, and isn't even necessarily supported on all the point versions of the PHP versions which we do support. I have coded up a manual test for "valid zip", with a comment pointing to the spec. - In theory, tests now cover 100% of the code in Shared/File. In practice ... One of the tests require that chmod works properly, which is not quite true on Windows systems, so that test is skipped on Windows. Another test requires that php.ini uses a non-default value for upload_temp_dir (can't be overridden in application code), which is probably not the case when Github runs the unit tests, so that test is skipped when appropriate. I have run tests for both on systems where they are not skipped. * Update File.php * Scrutinizer Timeout It's not actually timing out, it's just waiting for something to finish that finished ages ago. Making a meaningless comment change in hopes that will clear the jam. Not particularly hopeful.
@oleibman I can confirm that attempting to read an empty file now throws PhpOffice\PhpSpreadsheet\Reader\Exception: Could not find zip member zip://test.xlsx#_rels/.rels instead of continuing with PHP warnings as it did until 1.18.0 included (tested today, with master branch at bc9234e), which fixes my main issue. The invalid file case throws the same Reader exception (instead of ValueError: Invalid or uninitialized Zip object as it did before), which is good enough for me. Looking forward to seeing this in the next official release. Thanks ! |
Thank you for reporting the problem and confirming the fix. |
Fixed in 1.19.0 |
This is:
What is the expected behavior?
Reader should check return value of failing functions (e.g. ZipArchive::open, simplexml_load_string) and throw some meaningful exception, instead of continuing.
What is the current behavior?
Example with empty xlsx file:
What are the steps to reproduce?
Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
Generate an empty or invalid Excel file, e.g. with
touch test.xlsx
orecho 'hello world' >test.xlsx
and run the above script.Which versions of PhpSpreadsheet and PHP are affected?
Tested on PHP 7.4.3 with PhpSpreadsheet 1.15.0.
The text was updated successfully, but these errors were encountered: