Skip to content

Commit 4a65011

Browse files
author
MarkBaker
committed
Allow Reader format identification to use a subset of possible Readers
1 parent c05fb6e commit 4a65011

File tree

5 files changed

+144
-37
lines changed

5 files changed

+144
-37
lines changed

docs/topics/file-formats.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,8 @@ semi-colon (`;`) are used as separators instead of a comma, although
8080
other symbols can be used. Because CSV is a text-only format, it doesn't
8181
support any data formatting options.
8282

83-
"CSV" is not a single, well-defined format (although see RFC 4180 for
83+
"CSV" is not a single, well-defined format (although see
84+
[RFC 4180](https://www.rfc-editor.org/rfc/rfc4180.html) for
8485
one definition that is commonly used). Rather, in practice the term
8586
"CSV" refers to any file that:
8687

@@ -117,5 +118,5 @@ Wide Web Consortium (W3C). However, in 2000, HTML also became an
117118
international standard (ISO/IEC 15445:2000). HTML 4.01 was published in
118119
late 1999, with further errata published through 2001. In 2004
119120
development began on HTML5 in the Web Hypertext Application Technology
120-
Working Group (WHATWG), which became a joint deliverable with the W3C in
121-
2008.
121+
Working Group (WHATWG), which became a joint deliverable with the W3C in 2008.
122+

docs/topics/reading-files.md

+55-11
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,22 @@ practise), it will reject the Xls loader that it would normally use for
4444
a .xls file; and test the file using the other loaders until it finds
4545
the appropriate loader, and then use that to read the file.
4646

47+
If you know that this is an `xls` file, but don't know whether it is a
48+
genuine BIFF-format Excel or Html markup with an xls extension, you can
49+
limit the loader to check only those two possibilities by passing in an
50+
array of Readers to test against.
51+
52+
```php
53+
$inputFileName = './sampleData/example1.xls';
54+
$testAgainstFormats = [
55+
\PhpOffice\PhpSpreadsheet\IOFactory::READER_XLS,
56+
\PhpOffice\PhpSpreadsheet\IOFactory::READER_HTML,
57+
];
58+
59+
/** Load $inputFileName to a Spreadsheet Object **/
60+
$spreadsheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($inputFileName, 0, $testAgainstFormats);
61+
```
62+
4763
While easy to implement in your code, and you don't need to worry about
4864
the file type; this isn't the most efficient method to load a file; and
4965
it lacks the flexibility to configure the loader in any way before
@@ -118,6 +134,34 @@ $spreadsheet = $reader->load($inputFileName);
118134
See `samples/Reader/04_Simple_file_reader_using_the_IOFactory_to_identify_a_reader_to_use.php`
119135
for a working example of this code.
120136

137+
As with the IOFactory `load()` method, you can also pass an array of formats
138+
for the `identify()` method to check against if you know that it will only
139+
be in a subset of the possible formats that PhpSpreadsheet supports.
140+
141+
```php
142+
$inputFileName = './sampleData/example1.xls';
143+
$testAgainstFormats = [
144+
\PhpOffice\PhpSpreadsheet\IOFactory::READER_XLS,
145+
\PhpOffice\PhpSpreadsheet\IOFactory::READER_HTML,
146+
];
147+
148+
/** Identify the type of $inputFileName **/
149+
$inputFileType = \PhpOffice\PhpSpreadsheet\IOFactory::identify($inputFileName, $testAgainstFormats);
150+
```
151+
152+
You can also use this to confirm that a file is what it claims to be:
153+
154+
```php
155+
$inputFileName = './sampleData/example1.xls';
156+
157+
try {
158+
/** Verify that $inputFileName really is an Xls file **/
159+
$inputFileType = \PhpOffice\PhpSpreadsheet\IOFactory::identify($inputFileName, [\PhpOffice\PhpSpreadsheet\IOFactory::READER_XLS]);
160+
} catch (\PhpOffice\PhpSpreadsheet\Reader\Exception $e) {
161+
// File isn't actually an Xls file, even though it has an xls extension
162+
}
163+
```
164+
121165
## Spreadsheet Reader Options
122166

123167
Once you have created a reader object for the workbook that you want to
@@ -146,7 +190,7 @@ $spreadsheet = $reader->load($inputFileName);
146190
See `samples/Reader/05_Simple_file_reader_using_the_read_data_only_option.php`
147191
for a working example of this code.
148192

149-
It is important to note that Workbooks (and PhpSpreadsheet) store dates
193+
It is important to note that most Workbooks (and PhpSpreadsheet) store dates
150194
and times as simple numeric values: they can only be distinguished from
151195
other numeric values by the format mask that is applied to that cell.
152196
When setting read data only to true, PhpSpreadsheet doesn't read the
@@ -162,8 +206,8 @@ Reading Only Data from a Spreadsheet File applies to Readers:
162206

163207
Reader | Y/N |Reader | Y/N |Reader | Y/N |
164208
----------|:---:|--------|:---:|--------------|:---:|
165-
Xlsx | YES | Xls | YES | Xml | YES |
166-
Ods | YES | SYLK | NO | Gnumeric | YES |
209+
Xlsx | YES | Xls | YES | Xml | YES |
210+
Ods | YES | SYLK | NO | Gnumeric | YES |
167211
CSV | NO | HTML | NO
168212

169213
### Reading Only Named WorkSheets from a File
@@ -233,8 +277,8 @@ Reading Only Named WorkSheets from a File applies to Readers:
233277

234278
Reader | Y/N |Reader | Y/N |Reader | Y/N |
235279
----------|:---:|--------|:---:|--------------|:---:|
236-
Xlsx | YES | Xls | YES | Xml | YES |
237-
Ods | YES | SYLK | NO | Gnumeric | YES |
280+
Xlsx | YES | Xls | YES | Xml | YES |
281+
Ods | YES | SYLK | NO | Gnumeric | YES |
238282
CSV | NO | HTML | NO
239283

240284
### Reading Only Specific Columns and Rows from a File (Read Filters)
@@ -381,7 +425,7 @@ Using Read Filters applies to:
381425

382426
Reader | Y/N |Reader | Y/N |Reader | Y/N |
383427
----------|:---:|--------|:---:|--------------|:---:|
384-
Xlsx | YES | Xls | YES | Xml | YES |
428+
Xlsx | YES | Xls | YES | Xml | YES |
385429
Ods | YES | SYLK | NO | Gnumeric | YES |
386430
CSV | YES | HTML | NO | | |
387431

@@ -439,7 +483,7 @@ Combining Multiple Files into a Single Spreadsheet Object applies to:
439483

440484
Reader | Y/N |Reader | Y/N |Reader | Y/N |
441485
----------|:---:|--------|:---:|--------------|:---:|
442-
Xlsx | NO | Xls | NO | Xml | NO |
486+
Xlsx | NO | Xls | NO | Xml | NO |
443487
Ods | NO | SYLK | YES | Gnumeric | NO |
444488
CSV | YES | HTML | NO
445489

@@ -516,7 +560,7 @@ Splitting a single loaded file across multiple worksheets applies to:
516560

517561
Reader | Y/N |Reader | Y/N |Reader | Y/N |
518562
----------|:---:|--------|:---:|--------------|:---:|
519-
Xlsx | NO | Xls | NO | Xml | NO |
563+
Xlsx | NO | Xls | NO | Xml | NO |
520564
Ods | NO | SYLK | NO | Gnumeric | NO |
521565
CSV | YES | HTML | NO
522566

@@ -556,7 +600,7 @@ Setting CSV delimiter applies to:
556600

557601
Reader | Y/N |Reader | Y/N |Reader | Y/N |
558602
----------|:---:|--------|:---:|--------------|:---:|
559-
Xlsx | NO | Xls | NO | Xml | NO |
603+
Xlsx | NO | Xls | NO | Xml | NO |
560604
Ods | NO | SYLK | NO | Gnumeric | NO |
561605
CSV | YES | HTML | NO
562606

@@ -594,7 +638,7 @@ Applies to:
594638

595639
Reader | Y/N |Reader | Y/N |Reader | Y/N |
596640
----------|:---:|--------|:---:|--------------|:---:|
597-
Xlsx | NO | Xls | NO | Xml | NO |
641+
Xlsx | NO | Xls | NO | Xml | NO |
598642
Ods | NO | SYLK | NO | Gnumeric | NO |
599643
CSV | YES | HTML | NO
600644

@@ -646,7 +690,7 @@ Loading using a Value Binder applies to:
646690

647691
Reader | Y/N |Reader | Y/N |Reader | Y/N
648692
----------|:---:|--------|:---:|--------------|:---:
649-
Xlsx | NO | Xls | NO | Xml | NO
693+
Xlsx | NO | Xls | NO | Xml | NO
650694
Ods | NO | SYLK | NO | Gnumeric | NO
651695
CSV | YES | HTML | YES
652696

src/PhpSpreadsheet/IOFactory.php

+65-23
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,39 @@
1414
*/
1515
abstract class IOFactory
1616
{
17+
public const READER_XLSX = 'Xlsx';
18+
public const READER_XLS = 'Xls';
19+
public const READER_XML = 'Xml';
20+
public const READER_ODS = 'Ods';
21+
public const READER_SYLK = 'Slk';
22+
public const READER_SLK = 'Slk';
23+
public const READER_GNUMERIC = 'Gnumeric';
24+
public const READER_HTML = 'Html';
25+
public const READER_CSV = 'Csv';
26+
27+
public const WRITER_XLSX = 'Xlsx';
28+
public const WRITER_XLS = 'Xls';
29+
public const WRITER_ODS = 'Ods';
30+
public const WRITER_CSV = 'Csv';
31+
public const WRITER_HTML = 'Html';
32+
1733
private static $readers = [
18-
'Xlsx' => Reader\Xlsx::class,
19-
'Xls' => Reader\Xls::class,
20-
'Xml' => Reader\Xml::class,
21-
'Ods' => Reader\Ods::class,
22-
'Slk' => Reader\Slk::class,
23-
'Gnumeric' => Reader\Gnumeric::class,
24-
'Html' => Reader\Html::class,
25-
'Csv' => Reader\Csv::class,
34+
self::READER_XLSX => Reader\Xlsx::class,
35+
self::READER_XLS => Reader\Xls::class,
36+
self::READER_XML => Reader\Xml::class,
37+
self::READER_ODS => Reader\Ods::class,
38+
self::READER_SLK => Reader\Slk::class,
39+
self::READER_GNUMERIC => Reader\Gnumeric::class,
40+
self::READER_HTML => Reader\Html::class,
41+
self::READER_CSV => Reader\Csv::class,
2642
];
2743

2844
private static $writers = [
29-
'Xls' => Writer\Xls::class,
30-
'Xlsx' => Writer\Xlsx::class,
31-
'Ods' => Writer\Ods::class,
32-
'Csv' => Writer\Csv::class,
33-
'Html' => Writer\Html::class,
45+
self::WRITER_XLS => Writer\Xls::class,
46+
self::WRITER_XLSX => Writer\Xlsx::class,
47+
self::WRITER_ODS => Writer\Ods::class,
48+
self::WRITER_CSV => Writer\Csv::class,
49+
self::WRITER_HTML => Writer\Html::class,
3450
'Tcpdf' => Writer\Pdf\Tcpdf::class,
3551
'Dompdf' => Writer\Pdf\Dompdf::class,
3652
'Mpdf' => Writer\Pdf\Mpdf::class,
@@ -70,20 +86,28 @@ public static function createReader(string $readerType): IReader
7086
* Loads Spreadsheet from file using automatic Reader\IReader resolution.
7187
*
7288
* @param string $filename The name of the spreadsheet file
89+
* @param int $flags the optional second parameter flags may be used to identify specific elements
90+
* that should be loaded, but which won't be loaded by default, using these values:
91+
* IReader::LOAD_WITH_CHARTS - Include any charts that are defined in the loaded file
92+
* @param string[] $readers An array of Readers to use to identify the file type. By default, load() will try
93+
* all possible Readers until it finds a match; but this allows you to pass in a
94+
* list of Readers so it will only try the subset that you specify here.
95+
* Values in this list can be any of the constant values defined in the set
96+
* IOFactory::READER_*.
7397
*/
74-
public static function load(string $filename, int $flags = 0): Spreadsheet
98+
public static function load(string $filename, int $flags = 0, ?array $readers = null): Spreadsheet
7599
{
76-
$reader = self::createReaderForFile($filename);
100+
$reader = self::createReaderForFile($filename, $readers);
77101

78102
return $reader->load($filename, $flags);
79103
}
80104

81105
/**
82106
* Identify file type using automatic IReader resolution.
83107
*/
84-
public static function identify(string $filename): string
108+
public static function identify(string $filename, ?array $readers = null): string
85109
{
86-
$reader = self::createReaderForFile($filename);
110+
$reader = self::createReaderForFile($filename, $readers);
87111
$className = get_class($reader);
88112
$classType = explode('\\', $className);
89113
unset($reader);
@@ -93,14 +117,32 @@ public static function identify(string $filename): string
93117

94118
/**
95119
* Create Reader\IReader for file using automatic IReader resolution.
120+
*
121+
* @param string[] $readers An array of Readers to use to identify the file type. By default, load() will try
122+
* all possible Readers until it finds a match; but this allows you to pass in a
123+
* list of Readers so it will only try the subset that you specify here.
124+
* Values in this list can be any of the constant values defined in the set
125+
* IOFactory::READER_*.
96126
*/
97-
public static function createReaderForFile(string $filename): IReader
127+
public static function createReaderForFile(string $filename, ?array $readers = null): IReader
98128
{
99129
File::assertFile($filename);
100130

131+
$testReaders = self::$readers;
132+
if ($readers !== null) {
133+
$readers = array_map('strtoupper', $readers);
134+
$testReaders = array_filter(
135+
self::$readers,
136+
function (string $readerType) use ($readers) {
137+
return in_array(strtoupper($readerType), $readers, true);
138+
},
139+
ARRAY_FILTER_USE_KEY
140+
);
141+
}
142+
101143
// First, lucky guess by inspecting file extension
102144
$guessedReader = self::getReaderTypeFromExtension($filename);
103-
if ($guessedReader !== null) {
145+
if (($guessedReader !== null) && array_key_exists($guessedReader, $testReaders)) {
104146
$reader = self::createReader($guessedReader);
105147

106148
// Let's see if we are lucky
@@ -110,11 +152,11 @@ public static function createReaderForFile(string $filename): IReader
110152
}
111153

112154
// If we reach here then "lucky guess" didn't give any result
113-
// Try walking through all the options in self::$autoResolveClasses
114-
foreach (self::$readers as $type => $class) {
155+
// Try walking through all the options in self::$readers (or the selected subset)
156+
foreach ($testReaders as $readerType => $class) {
115157
// Ignore our original guess, we know that won't work
116-
if ($type !== $guessedReader) {
117-
$reader = self::createReader($type);
158+
if ($readerType !== $guessedReader) {
159+
$reader = self::createReader($readerType);
118160
if ($reader->canRead($filename)) {
119161
return $reader;
120162
}

src/PhpSpreadsheet/Reader/BaseReader.php

+4
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,10 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
153153

154154
/**
155155
* Loads Spreadsheet from file.
156+
*
157+
* @param int $flags the optional second parameter flags may be used to identify specific elements
158+
* that should be loaded, but which won't be loaded by default, using these values:
159+
* IReader::LOAD_WITH_CHARTS - Include any charts that are defined in the loaded file
156160
*/
157161
public function load(string $filename, int $flags = 0): Spreadsheet
158162
{

tests/PhpSpreadsheetTests/IOFactoryTest.php

+16
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,22 @@ public function providerIdentify(): array
108108
];
109109
}
110110

111+
public function testFormatAsExpected(): void
112+
{
113+
$fileName = 'samples/templates/30template.xls';
114+
115+
$actual = IOFactory::identify($fileName, [IOFactory::READER_XLS]);
116+
self::assertSame('Xls', $actual);
117+
}
118+
119+
public function testFormatNotAsExpectedThrowsException(): void
120+
{
121+
$fileName = 'samples/templates/30template.xls';
122+
123+
$this->expectException(ReaderException::class);
124+
IOFactory::identify($fileName, [IOFactory::READER_ODS]);
125+
}
126+
111127
public function testIdentifyNonExistingFileThrowException(): void
112128
{
113129
$this->expectException(ReaderException::class);

0 commit comments

Comments
 (0)