Skip to content

Commit 5116371

Browse files
authored
Tweaks to Input File Validation (#2217)
* Tweaks to Input File Validation This started as a response to issue #1718, for which it is a partial (not complete) solution. The following changes are made: - canRead can currently throw an exception. This seems wrong. It should just return true/false. - Breaking change of sorts. When AssertFile encounters a non-existent or unreadable file, it throws InvalidArgumentException. This does not make sense. I have changed it to throw PhpSpreadsheet/Reader/Exception. - Since the previous bullet item required changing of most of the Reader files anyhow, this is a good time to add explicit typing for canRead in the function signature rather than the DocBlock. Since all the canRead functions inherit from an abstract version in IReader, they all have to be changed simulatneously. Except for Xlsx and Ods, most of the Reader files are otherwise unchanged. - AssertFile is changed to add an optional "zip member" parameter. It will check for the existence of an appropriate member in what is supposed to be a zip file. It is used by Xlsx and Ods. - Verifying that a given file is a valid zip ought to be a feature of ZipArchive. Thanks to a particularly nasty bug in php/libzip (see https://bugs.php.net/bug.php?id=81222), it is unsafe to attempt to open a zero-length file as a zip archive. There is a solution, but it does not apply to all the PHP releases which we support, and isn't even necessarily supported on all the point versions of the PHP versions which we do support. I have coded up a manual test for "valid zip", with a comment pointing to the spec. - In theory, tests now cover 100% of the code in Shared/File. In practice ... One of the tests require that chmod works properly, which is not quite true on Windows systems, so that test is skipped on Windows. Another test requires that php.ini uses a non-default value for upload_temp_dir (can't be overridden in application code), which is probably not the case when Github runs the unit tests, so that test is skipped when appropriate. I have run tests for both on systems where they are not skipped. * Update File.php * Scrutinizer Timeout It's not actually timing out, it's just waiting for something to finish that finished ages ago. Making a meaningless comment change in hopes that will clear the jam. Not particularly hopeful.
1 parent 3c5750b commit 5116371

19 files changed

+468
-162
lines changed

phpstan-baseline.neon

-15
Original file line numberDiff line numberDiff line change
@@ -3625,21 +3625,6 @@ parameters:
36253625
count: 1
36263626
path: src/PhpSpreadsheet/Shared/Escher/DgContainer/SpgrContainer.php
36273627

3628-
-
3629-
message: "#^Strict comparison using \\=\\=\\= between string\\|false and null will always evaluate to false\\.$#"
3630-
count: 1
3631-
path: src/PhpSpreadsheet/Shared/File.php
3632-
3633-
-
3634-
message: "#^Method PhpOffice\\\\PhpSpreadsheet\\\\Shared\\\\File\\:\\:realpath\\(\\) should return string but returns string\\|false\\.$#"
3635-
count: 1
3636-
path: src/PhpSpreadsheet/Shared/File.php
3637-
3638-
-
3639-
message: "#^Method PhpOffice\\\\PhpSpreadsheet\\\\Shared\\\\File\\:\\:sysGetTempDir\\(\\) should return string but returns string\\|false\\.$#"
3640-
count: 2
3641-
path: src/PhpSpreadsheet/Shared/File.php
3642-
36433628
-
36443629
message: "#^Property PhpOffice\\\\PhpSpreadsheet\\\\Shared\\\\Font\\:\\:\\$autoSizeMethods has no typehint specified\\.$#"
36453630
count: 1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
<?php
22

33
use PhpOffice\PhpSpreadsheet\IOFactory;
4+
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
45

56
require __DIR__ . '/../Header.php';
67

@@ -9,6 +10,6 @@
910

1011
try {
1112
$spreadsheet = IOFactory::load($inputFileName);
12-
} catch (InvalidArgumentException $e) {
13+
} catch (ReaderException $e) {
1314
$helper->log('Error loading file "' . pathinfo($inputFileName, PATHINFO_BASENAME) . '": ' . $e->getMessage());
1415
}

src/PhpSpreadsheet/Reader/Csv.php

+4-8
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Reader;
44

5-
use InvalidArgumentException;
65
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
76
use PhpOffice\PhpSpreadsheet\Reader\Csv\Delimiter;
7+
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
88
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
99
use PhpOffice\PhpSpreadsheet\Spreadsheet;
1010

@@ -35,7 +35,7 @@ class Csv extends BaseReader
3535
private $inputEncoding = 'UTF-8';
3636

3737
/**
38-
* Fallback encoding if 'guess' strikes out.
38+
* Fallback encoding if guess strikes out.
3939
*
4040
* @var string
4141
*/
@@ -410,17 +410,13 @@ private static function extractStringLower($extension): string
410410

411411
/**
412412
* Can the current IReader read the file?
413-
*
414-
* @param string $pFilename
415-
*
416-
* @return bool
417413
*/
418-
public function canRead($pFilename)
414+
public function canRead(string $pFilename): bool
419415
{
420416
// Check if file exists
421417
try {
422418
$this->openFile($pFilename);
423-
} catch (InvalidArgumentException $e) {
419+
} catch (ReaderException $e) {
424420
return false;
425421
}
426422

src/PhpSpreadsheet/Reader/Gnumeric.php

+3-9
Original file line numberDiff line numberDiff line change
@@ -77,18 +77,12 @@ public function __construct()
7777

7878
/**
7979
* Can the current IReader read the file?
80-
*
81-
* @param string $pFilename
82-
*
83-
* @return bool
8480
*/
85-
public function canRead($pFilename)
81+
public function canRead(string $pFilename): bool
8682
{
87-
File::assertFile($pFilename);
88-
89-
// Check if gzlib functions are available
9083
$data = '';
91-
if (function_exists('gzread')) {
84+
// Check if gzlib functions are available
85+
if (File::testFileNoThrow($pFilename) && function_exists('gzread')) {
9286
// Read signature data (first 3 bytes)
9387
$fh = fopen($pFilename, 'rb');
9488
if ($fh !== false) {

src/PhpSpreadsheet/Reader/Html.php

+1-5
Original file line numberDiff line numberDiff line change
@@ -137,12 +137,8 @@ public function __construct()
137137

138138
/**
139139
* Validate that the current file is an HTML file.
140-
*
141-
* @param string $pFilename
142-
*
143-
* @return bool
144140
*/
145-
public function canRead($pFilename)
141+
public function canRead(string $pFilename): bool
146142
{
147143
// Check if file exists
148144
try {

src/PhpSpreadsheet/Reader/IReader.php

+1-5
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,8 @@ public function __construct();
1313

1414
/**
1515
* Can the current IReader read the file?
16-
*
17-
* @param string $pFilename
18-
*
19-
* @return bool
2016
*/
21-
public function canRead($pFilename);
17+
public function canRead(string $pFilename): bool;
2218

2319
/**
2420
* Read data only?

src/PhpSpreadsheet/Reader/Ods.php

+35-50
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
1111
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
1212
use PhpOffice\PhpSpreadsheet\Cell\DataType;
13-
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
1413
use PhpOffice\PhpSpreadsheet\Reader\Ods\AutoFilter;
1514
use PhpOffice\PhpSpreadsheet\Reader\Ods\DefinedNames;
1615
use PhpOffice\PhpSpreadsheet\Reader\Ods\PageSettings;
@@ -28,6 +27,8 @@
2827

2928
class Ods extends BaseReader
3029
{
30+
const INITIAL_FILE = 'content.xml';
31+
3132
/**
3233
* Create a new Ods Reader instance.
3334
*/
@@ -39,46 +40,42 @@ public function __construct()
3940

4041
/**
4142
* Can the current IReader read the file?
42-
*
43-
* @param string $pFilename
44-
*
45-
* @return bool
4643
*/
47-
public function canRead($pFilename)
44+
public function canRead(string $pFilename): bool
4845
{
49-
File::assertFile($pFilename);
50-
5146
$mimeType = 'UNKNOWN';
5247

5348
// Load file
5449

55-
$zip = new ZipArchive();
56-
if ($zip->open($pFilename) === true) {
57-
// check if it is an OOXML archive
58-
$stat = $zip->statName('mimetype');
59-
if ($stat && ($stat['size'] <= 255)) {
60-
$mimeType = $zip->getFromName($stat['name']);
61-
} elseif ($zip->statName('META-INF/manifest.xml')) {
62-
$xml = simplexml_load_string(
63-
$this->securityScanner->scan($zip->getFromName('META-INF/manifest.xml')),
64-
'SimpleXMLElement',
65-
Settings::getLibXmlLoaderOptions()
66-
);
67-
$namespacesContent = $xml->getNamespaces(true);
68-
if (isset($namespacesContent['manifest'])) {
69-
$manifest = $xml->children($namespacesContent['manifest']);
70-
foreach ($manifest as $manifestDataSet) {
71-
$manifestAttributes = $manifestDataSet->attributes($namespacesContent['manifest']);
72-
if ($manifestAttributes->{'full-path'} == '/') {
73-
$mimeType = (string) $manifestAttributes->{'media-type'};
74-
75-
break;
50+
if (File::testFileNoThrow($pFilename)) {
51+
$zip = new ZipArchive();
52+
if ($zip->open($pFilename) === true) {
53+
// check if it is an OOXML archive
54+
$stat = $zip->statName('mimetype');
55+
if ($stat && ($stat['size'] <= 255)) {
56+
$mimeType = $zip->getFromName($stat['name']);
57+
} elseif ($zip->statName('META-INF/manifest.xml')) {
58+
$xml = simplexml_load_string(
59+
$this->securityScanner->scan($zip->getFromName('META-INF/manifest.xml')),
60+
'SimpleXMLElement',
61+
Settings::getLibXmlLoaderOptions()
62+
);
63+
$namespacesContent = $xml->getNamespaces(true);
64+
if (isset($namespacesContent['manifest'])) {
65+
$manifest = $xml->children($namespacesContent['manifest']);
66+
foreach ($manifest as $manifestDataSet) {
67+
$manifestAttributes = $manifestDataSet->attributes($namespacesContent['manifest']);
68+
if ($manifestAttributes->{'full-path'} == '/') {
69+
$mimeType = (string) $manifestAttributes->{'media-type'};
70+
71+
break;
72+
}
7673
}
7774
}
7875
}
79-
}
8076

81-
$zip->close();
77+
$zip->close();
78+
}
8279
}
8380

8481
return $mimeType === 'application/vnd.oasis.opendocument.spreadsheet';
@@ -93,18 +90,13 @@ public function canRead($pFilename)
9390
*/
9491
public function listWorksheetNames($pFilename)
9592
{
96-
File::assertFile($pFilename);
97-
98-
$zip = new ZipArchive();
99-
if ($zip->open($pFilename) !== true) {
100-
throw new ReaderException('Could not open ' . $pFilename . ' for reading! Error opening file.');
101-
}
93+
File::assertFile($pFilename, self::INITIAL_FILE);
10294

10395
$worksheetNames = [];
10496

10597
$xml = new XMLReader();
10698
$xml->xml(
107-
$this->securityScanner->scanFile('zip://' . realpath($pFilename) . '#content.xml'),
99+
$this->securityScanner->scanFile('zip://' . realpath($pFilename) . '#' . self::INITIAL_FILE),
108100
null,
109101
Settings::getLibXmlLoaderOptions()
110102
);
@@ -145,18 +137,13 @@ public function listWorksheetNames($pFilename)
145137
*/
146138
public function listWorksheetInfo($pFilename)
147139
{
148-
File::assertFile($pFilename);
140+
File::assertFile($pFilename, self::INITIAL_FILE);
149141

150142
$worksheetInfo = [];
151143

152-
$zip = new ZipArchive();
153-
if ($zip->open($pFilename) !== true) {
154-
throw new ReaderException('Could not open ' . $pFilename . ' for reading! Error opening file.');
155-
}
156-
157144
$xml = new XMLReader();
158145
$xml->xml(
159-
$this->securityScanner->scanFile('zip://' . realpath($pFilename) . '#content.xml'),
146+
$this->securityScanner->scanFile('zip://' . realpath($pFilename) . '#' . self::INITIAL_FILE),
160147
null,
161148
Settings::getLibXmlLoaderOptions()
162149
);
@@ -253,12 +240,10 @@ public function load(string $pFilename, int $flags = 0)
253240
*/
254241
public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet)
255242
{
256-
File::assertFile($pFilename);
243+
File::assertFile($pFilename, self::INITIAL_FILE);
257244

258245
$zip = new ZipArchive();
259-
if ($zip->open($pFilename) !== true) {
260-
throw new Exception("Could not open {$pFilename} for reading! Error opening file.");
261-
}
246+
$zip->open($pFilename);
262247

263248
// Meta
264249

@@ -289,7 +274,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet)
289274

290275
$dom = new DOMDocument('1.01', 'UTF-8');
291276
$dom->loadXML(
292-
$this->securityScanner->scan($zip->getFromName('content.xml')),
277+
$this->securityScanner->scan($zip->getFromName(self::INITIAL_FILE)),
293278
Settings::getLibXmlLoaderOptions()
294279
);
295280

src/PhpSpreadsheet/Reader/Slk.php

+2-7
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Reader;
44

5-
use InvalidArgumentException;
65
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
76
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
87
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
@@ -65,16 +64,12 @@ public function __construct()
6564

6665
/**
6766
* Validate that the current file is a SYLK file.
68-
*
69-
* @param string $pFilename
70-
*
71-
* @return bool
7267
*/
73-
public function canRead($pFilename)
68+
public function canRead(string $pFilename): bool
7469
{
7570
try {
7671
$this->openFile($pFilename);
77-
} catch (InvalidArgumentException $e) {
72+
} catch (ReaderException $e) {
7873
return false;
7974
}
8075

src/PhpSpreadsheet/Reader/Xls.php

+4-6
Original file line numberDiff line numberDiff line change
@@ -419,14 +419,12 @@ public function __construct()
419419

420420
/**
421421
* Can the current IReader read the file?
422-
*
423-
* @param string $pFilename
424-
*
425-
* @return bool
426422
*/
427-
public function canRead($pFilename)
423+
public function canRead(string $pFilename): bool
428424
{
429-
File::assertFile($pFilename);
425+
if (!File::testFileNoThrow($pFilename)) {
426+
return false;
427+
}
430428

431429
try {
432430
// Use ParseXL for the hard work.

0 commit comments

Comments
 (0)