Skip to content

Commit 0d1c9e4

Browse files
authored
ListWorksheetInfo/Names for Html/Csv/Slk (#3709)
* ListWorksheetInfo/Names for Html/Csv/Slk Fix #3706. ListWorksheetInfo is implemented for all Readers except Html. For most (not all), ListWorksheetInfo is more efficient than reading the spreadsheet. I can't think of a way to make that so for Html, but that shouldn't be a reason to leave it unimplemented. ListWorksheetNames is not implemented for Html, Csv, or Slk. It isn't terribly useful for those formats, but that isn't a reason to omit it. The requester's use case consists of using IOFactory to create a reader for a file of unknown format and determining the first sheet name. That seems legitimate, but it is currently not possible without extra user code if the file is Html, Csv, or Slk; this PR will make it possible. When Excel opens a Slk or Csv file, the sheet name is based on the file name. PhpSpreadsheet does this for Slk, but it uses a default name for Csv. I am not interested in creating a break for that behavior, but I have added a new boolean property `sheetNameIsFileName` with a setter to Csv Reader. The requester actually mentioned that possibility in our discussion, although it is not essential to the request. As an adjunct to the issue, the requester wishes to use the worksheet name in `setLoadSheetsOnly`. That is already possible for Html, Csv, and Slk, but that particular property is ignored for those formats. I do not see a reason to change that behavior. This treatment is now explicitly noted in the documentation for property `loadSheetsOnly`. There had been no tests for what happens when `loadSheetsOnly` is specified but no sheets match the criteria for the formats for which this makes sense (Xlsx, Xls, Ods, Gnumeric, Xml). The behavior was not consistent - some formats threw an Exception while others continued with a single empty worksheet. All cases attempt to set the active sheet, and they will now all throw identical Exceptions when they attempt to do so in this situation. Tests are added for each. There also had been no tests for `loadSheetsOnly` returning more than one sheet. One is added. * Update LoadSheetsOnlyTest.php Add strict types to this new test, consistent with work being done in PR #3718. * Update LoadSheetsOnlyTest.php Add strict types to this new test, consistent with work being done in PR #3718.
1 parent bd633b1 commit 0d1c9e4

20 files changed

+369
-25
lines changed

src/PhpSpreadsheet/Reader/BaseReader.php

+36
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ abstract class BaseReader implements IReader
3939
/**
4040
* Restrict which sheets should be loaded?
4141
* This property holds an array of worksheet names to be loaded. If null, then all worksheets will be loaded.
42+
* This property is ignored for Csv, Html, and Slk.
4243
*
4344
* @var null|string[]
4445
*/
@@ -203,4 +204,39 @@ protected function openFile(string $filename): void
203204

204205
$this->fileHandle = $fileHandle;
205206
}
207+
208+
/**
209+
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
210+
*
211+
* @param string $filename
212+
*
213+
* @return array
214+
*/
215+
public function listWorksheetInfo($filename)
216+
{
217+
throw new PhpSpreadsheetException('Reader classes must implement their own listWorksheetInfo() method');
218+
}
219+
220+
/**
221+
* Returns names of the worksheets from a file,
222+
* possibly without parsing the whole file to a Spreadsheet object.
223+
* Readers will often have a more efficient method with which
224+
* they can override this method.
225+
*
226+
* @param string $filename
227+
*
228+
* @return array
229+
*/
230+
public function listWorksheetNames($filename)
231+
{
232+
$returnArray = [];
233+
$info = $this->listWorksheetInfo($filename);
234+
foreach ($info as $infoArray) {
235+
if (isset($infoArray['worksheetName'])) {
236+
$returnArray[] = $infoArray['worksheetName'];
237+
}
238+
}
239+
240+
return $returnArray;
241+
}
206242
}

src/PhpSpreadsheet/Reader/Csv.php

+19-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
1111
use PhpOffice\PhpSpreadsheet\Spreadsheet;
1212
use PhpOffice\PhpSpreadsheet\Style\NumberFormat;
13+
use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
1314

1415
class Csv extends BaseReader
1516
{
@@ -106,6 +107,9 @@ class Csv extends BaseReader
106107
/** @var bool */
107108
private $preserveNullString = false;
108109

110+
/** @var bool */
111+
private $sheetNameIsFileName = false;
112+
109113
/**
110114
* Create a new CSV Reader instance.
111115
*/
@@ -220,8 +224,12 @@ protected function inferSeparator(): void
220224

221225
/**
222226
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
227+
*
228+
* @param string $filename
229+
*
230+
* @return array
223231
*/
224-
public function listWorksheetInfo(string $filename): array
232+
public function listWorksheetInfo($filename)
225233
{
226234
// Open file
227235
$this->openFileOrMemory($filename);
@@ -381,6 +389,9 @@ private function loadStringOrFile(string $filename, Spreadsheet $spreadsheet, bo
381389
$spreadsheet->createSheet();
382390
}
383391
$sheet = $spreadsheet->setActiveSheetIndex($this->sheetIndex);
392+
if ($this->sheetNameIsFileName) {
393+
$sheet->setTitle(substr(basename($filename, '.csv'), 0, Worksheet::SHEET_TITLE_MAXIMUM_LENGTH));
394+
}
384395

385396
// Set our starting row based on whether we're in contiguous mode or not
386397
$currentRow = 1;
@@ -643,4 +654,11 @@ public function getPreserveNullString(): bool
643654
{
644655
return $this->preserveNullString;
645656
}
657+
658+
public function setSheetNameIsFileName(bool $sheetNameIsFileName): self
659+
{
660+
$this->sheetNameIsFileName = $sheetNameIsFileName;
661+
662+
return $this;
663+
}
646664
}

src/PhpSpreadsheet/Reader/Html.php

+25
Original file line numberDiff line numberDiff line change
@@ -1183,4 +1183,29 @@ private function setBorderStyle(Style $cellStyle, $styleValue, $type): void
11831183
],
11841184
]);
11851185
}
1186+
1187+
/**
1188+
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
1189+
*
1190+
* @param string $filename
1191+
*
1192+
* @return array
1193+
*/
1194+
public function listWorksheetInfo($filename)
1195+
{
1196+
$info = [];
1197+
$spreadsheet = new Spreadsheet();
1198+
$this->loadIntoExisting($filename, $spreadsheet);
1199+
foreach ($spreadsheet->getAllSheets() as $sheet) {
1200+
$newEntry = ['worksheetName' => $sheet->getTitle()];
1201+
$newEntry['lastColumnLetter'] = $sheet->getHighestDataColumn();
1202+
$newEntry['lastColumnIndex'] = Coordinate::columnIndexFromString($sheet->getHighestDataColumn()) - 1;
1203+
$newEntry['totalRows'] = $sheet->getHighestDataRow();
1204+
$newEntry['totalColumns'] = $newEntry['lastColumnIndex'] + 1;
1205+
$info[] = $newEntry;
1206+
}
1207+
$spreadsheet->disconnectWorksheets();
1208+
1209+
return $info;
1210+
}
11861211
}

src/PhpSpreadsheet/Reader/Ods.php

+2-3
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
240240
{
241241
// Create new Spreadsheet
242242
$spreadsheet = new Spreadsheet();
243+
$spreadsheet->removeSheetByIndex(0);
243244

244245
// Load into this instance
245246
return $this->loadIntoExisting($filename, $spreadsheet);
@@ -345,9 +346,7 @@ public function loadIntoExisting($filename, Spreadsheet $spreadsheet)
345346
$worksheetStyleName = $worksheetDataSet->getAttributeNS($tableNs, 'style-name');
346347

347348
// Create sheet
348-
if ($worksheetID > 0) {
349-
$spreadsheet->createSheet(); // First sheet is added by default
350-
}
349+
$spreadsheet->createSheet();
351350
$spreadsheet->setActiveSheetIndex($worksheetID);
352351

353352
if ($worksheetName || is_numeric($worksheetName)) {

src/PhpSpreadsheet/Reader/Slk.php

+1-1
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ public function listWorksheetInfo($filename)
168168

169169
break;
170170
case 'Y':
171-
$rowIndex = substr($rowDatum, 1);
171+
$rowIndex = (int) substr($rowDatum, 1);
172172

173173
break;
174174
}

src/PhpSpreadsheet/Reader/Xls.php

+8
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,9 @@ class Xls extends BaseReader
417417
*/
418418
private $baseCell;
419419

420+
/** @var bool */
421+
private $activeSheetSet = false;
422+
420423
/**
421424
* Create a new Xls Reader instance.
422425
*/
@@ -829,6 +832,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
829832
}
830833

831834
// Parse the individual sheets
835+
$this->activeSheetSet = false;
832836
foreach ($this->sheets as $sheet) {
833837
if ($sheet['sheetType'] != 0x00) {
834838
// 0x00: Worksheet, 0x02: Chart, 0x06: Visual Basic module
@@ -1240,6 +1244,9 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
12401244
}
12411245
}
12421246
}
1247+
if ($this->activeSheetSet === false) {
1248+
$this->spreadsheet->setActiveSheetIndex(0);
1249+
}
12431250

12441251
// add the named ranges (defined names)
12451252
foreach ($this->definedname as $definedName) {
@@ -4401,6 +4408,7 @@ private function readWindow2(): void
44014408
$isActive = (bool) ((0x0400 & $options) >> 10);
44024409
if ($isActive) {
44034410
$this->spreadsheet->setActiveSheetIndex($this->spreadsheet->getIndex($this->phpSheet));
4411+
$this->activeSheetSet = true;
44044412
}
44054413

44064414
// bit: 11; mask: 0x0800; 0 = normal view, 1 = page break view

src/PhpSpreadsheet/Reader/Xlsx/WorkbookView.php

-3
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,6 @@ public function __construct(Spreadsheet $spreadsheet)
2222
*/
2323
public function viewSettings(SimpleXMLElement $xmlWorkbook, $mainNS, array $mapSheetId, bool $readDataOnly): void
2424
{
25-
if ($this->spreadsheet->getSheetCount() == 0) {
26-
$this->spreadsheet->createSheet();
27-
}
2825
// Default active sheet index to the first loaded worksheet from the file
2926
$this->spreadsheet->setActiveSheetIndex(0);
3027

tests/PhpSpreadsheetTests/Reader/BaseNoLoad.php

-5
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,4 @@ public function canRead(string $filename): bool
1010
{
1111
return $filename !== '';
1212
}
13-
14-
public function loadxxx(string $filename): void
15-
{
16-
$this->loadSpreadsheetFromFile($filename);
17-
}
1813
}

tests/PhpSpreadsheetTests/Reader/BaseNoLoadTest.php

+9-1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,14 @@ public function testBaseNoLoad(): void
1212
$this->expectException(SpreadsheetException::class);
1313
$this->expectExceptionMessage('Reader classes must implement their own loadSpreadsheetFromFile() method');
1414
$reader = new BaseNoLoad();
15-
$reader->loadxxx('unknown.file');
15+
$reader->load('unknown.file');
16+
}
17+
18+
public function testBaseNoLoadInfo(): void
19+
{
20+
$this->expectException(SpreadsheetException::class);
21+
$this->expectExceptionMessage('Reader classes must implement their own listWorksheetInfo() method');
22+
$reader = new BaseNoLoad();
23+
$reader->listWorksheetInfo('unknown.file');
1624
}
1725
}

tests/PhpSpreadsheetTests/Reader/Csv/CsvCallbackTest.php

+8
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,14 @@ public function testCallbackDoNothing(): void
3030
$spreadsheet = $reader->load($filename);
3131
$sheet = $spreadsheet->getActiveSheet();
3232
self::assertEquals('Å', $sheet->getCell('A1')->getValue());
33+
$spreadsheet->disconnectWorksheets();
3334
}
3435

3536
public function callbackSetFallbackEncoding(Csv $reader): void
3637
{
3738
$reader->setFallbackEncoding('ISO-8859-2');
3839
$reader->setInputEncoding(Csv::GUESS_ENCODING);
40+
$reader->setSheetNameIsFileName(true);
3941
$reader->setEscapeCharacter('');
4042
}
4143

@@ -48,6 +50,7 @@ public function testFallbackEncodingDefltIso2(): void
4850
$sheet = $spreadsheet->getActiveSheet();
4951
self::assertEquals('premičre', $sheet->getCell('A1')->getValue());
5052
self::assertEquals('sixičme', $sheet->getCell('C2')->getValue());
53+
$spreadsheet->disconnectWorksheets();
5154
}
5255

5356
public function testIOFactory(): void
@@ -58,6 +61,7 @@ public function testIOFactory(): void
5861
$sheet = $spreadsheet->getActiveSheet();
5962
self::assertEquals('premičre', $sheet->getCell('A1')->getValue());
6063
self::assertEquals('sixičme', $sheet->getCell('C2')->getValue());
64+
$spreadsheet->disconnectWorksheets();
6165
}
6266

6367
public function testNonFallbackEncoding(): void
@@ -69,6 +73,7 @@ public function testNonFallbackEncoding(): void
6973
$sheet = $spreadsheet->getActiveSheet();
7074
self::assertEquals('première', $sheet->getCell('A1')->getValue());
7175
self::assertEquals('sixième', $sheet->getCell('C2')->getValue());
76+
$spreadsheet->disconnectWorksheets();
7277
}
7378

7479
public function testDefaultEscape(): void
@@ -79,6 +84,7 @@ public function testDefaultEscape(): void
7984
$sheet = $spreadsheet->getActiveSheet();
8085
// this is not how Excel views the file
8186
self::assertEquals('a\"hello', $sheet->getCell('A1')->getValue());
87+
$spreadsheet->disconnectWorksheets();
8288
}
8389

8490
public function testBetterEscape(): void
@@ -89,5 +95,7 @@ public function testBetterEscape(): void
8995
$sheet = $spreadsheet->getActiveSheet();
9096
// this is how Excel views the file
9197
self::assertEquals('a\"hello;hello;hello;\"', $sheet->getCell('A1')->getValue());
98+
self::assertSame('escape', $sheet->getTitle(), 'callback set sheet title to use file name rather than default');
99+
$spreadsheet->disconnectWorksheets();
92100
}
93101
}

tests/PhpSpreadsheetTests/Reader/Csv/CsvEncodingTest.php

+10-5
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,13 @@ public function testWorkSheetInfo($filename, $encoding): void
3333
$reader = new Csv();
3434
$reader->setInputEncoding($encoding);
3535
$info = $reader->listWorksheetInfo($filename);
36-
self::assertEquals('Worksheet', $info[0]['worksheetName']);
37-
self::assertEquals('B', $info[0]['lastColumnLetter']);
38-
self::assertEquals(1, $info[0]['lastColumnIndex']);
39-
self::assertEquals(2, $info[0]['totalRows']);
40-
self::assertEquals(2, $info[0]['totalColumns']);
36+
self::assertCount(1, $info);
37+
self::assertSame('Worksheet', $info[0]['worksheetName']);
38+
self::assertSame('B', $info[0]['lastColumnLetter']);
39+
self::assertSame(1, $info[0]['lastColumnIndex']);
40+
self::assertSame(2, $info[0]['totalRows']);
41+
self::assertSame(2, $info[0]['totalColumns']);
42+
self::assertSame(['Worksheet'], $reader->listWorksheetNames($filename));
4143
}
4244

4345
public static function providerEncodings(): array
@@ -78,6 +80,9 @@ public function testSurrogate(): void
7880
$filename = 'tests/data/Reader/CSV/premiere.utf16le.csv';
7981
$reader = new Csv();
8082
$reader->setInputEncoding(Csv::guessEncoding($filename));
83+
$names = $reader->listWorksheetNames($filename);
84+
// Following ignored, just make sure it's executable.
85+
$reader->setLoadSheetsOnly([$names[0]]);
8186
$spreadsheet = $reader->load($filename);
8287
$sheet = $spreadsheet->getActiveSheet();
8388
self::assertEquals('𐐀', $sheet->getCell('A3')->getValue());

tests/PhpSpreadsheetTests/Reader/Gnumeric/GnumericLoadTest.php

+11
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
namespace PhpOffice\PhpSpreadsheetTests\Reader\Gnumeric;
44

55
use DateTimeZone;
6+
use PhpOffice\PhpSpreadsheet\Exception as PhpSpreadsheetException;
67
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
78
use PhpOffice\PhpSpreadsheet\Reader\Gnumeric;
89
use PhpOffice\PhpSpreadsheet\Shared\Date;
@@ -170,6 +171,16 @@ public function testLoadSelectedSheets(): void
170171
$spreadsheet->disconnectWorksheets();
171172
}
172173

174+
public function testLoadNoSelectedSheets(): void
175+
{
176+
$this->expectException(PhpSpreadsheetException::class);
177+
$this->expectExceptionMessage('You tried to set a sheet active by the out of bounds index');
178+
$filename = 'samples/templates/GnumericTest.gnumeric';
179+
$reader = new Gnumeric();
180+
$reader->setLoadSheetsOnly(['Unknown Sheet', 'xReport Data']);
181+
$reader->load($filename);
182+
}
183+
173184
public function testLoadNotGnumeric(): void
174185
{
175186
$this->expectException(ReaderException::class);

tests/PhpSpreadsheetTests/Reader/Html/Issue2942Test.php

+26
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ public function testLoadFromString(): void
1414
$spreadsheet = $reader->loadFromString($content);
1515
$sheet = $spreadsheet->getActiveSheet();
1616
self::assertSame('éàâèî', $sheet->getCell('A1')->getValue());
17+
$spreadsheet->disconnectWorksheets();
1718
}
1819

1920
public function testLoadFromFile(): void
@@ -32,5 +33,30 @@ public function testLoadFromFile(): void
3233
self::assertSame('അആ', $sheet->getCell('B3')->getValue());
3334
self::assertSame('กขฃ', $sheet->getCell('C3')->getValue());
3435
self::assertSame('✀✐✠', $sheet->getCell('D3')->getValue());
36+
$spreadsheet->disconnectWorksheets();
37+
}
38+
39+
public function testInfo(): void
40+
{
41+
$file = 'tests/data/Reader/HTML/utf8chars.charset.html';
42+
$reader = new Html();
43+
$info = $reader->listWorksheetInfo($file);
44+
self::assertCount(1, $info);
45+
$info0 = $info[0];
46+
self::assertSame('Test Utf-8 characters voilà', $info0['worksheetName']);
47+
self::assertSame('D', $info0['lastColumnLetter']);
48+
self::assertSame(3, $info0['lastColumnIndex']);
49+
self::assertSame(7, $info0['totalRows']);
50+
self::assertSame(4, $info0['totalColumns']);
51+
$names = $reader->listWorksheetNames($file);
52+
self::assertCount(1, $names);
53+
self::assertSame('Test Utf-8 characters voilà', $names[0]);
54+
55+
// Following ignored, just make sure it's executable.
56+
$reader->setLoadSheetsOnly([$names[0]]);
57+
$spreadsheet = $reader->load($file);
58+
$sheet = $spreadsheet->getActiveSheet();
59+
self::assertSame('✀✐✠', $sheet->getCell('D3')->getValue());
60+
$spreadsheet->disconnectWorksheets();
3561
}
3662
}

0 commit comments

Comments
 (0)