Skip to content

Commit 8b63122

Browse files
committed
WIP Xlsx Reader Vml File with Unclosed br Tags
Fix PHPOffice#3125 (rejected previous PR PHPOffice#1181 as too risky, issue was also reported as PHPOffice#170). Vml file should be valid Xml, but Excel (possibly just Excel 2013) can generate unclosed `<br>` tags, preventing Xlsx reader from reading file correctly. I believe a very narrowly targeted fix, changing `<br>` to `<br/>`, and only when reading vml files, probably mitigates the risk. I am going to leave this PR as a draft for a while to see if others disagree. Another reason to leave it as a draft is the absence of a decent test file. I am able to copy an existing test spreadsheet which contains a vml file, and add some text and the problematic tag, and I can confirm that it fails to load correctly without the fix but loads correctly with the fix. However, it would be better if I had a real file. I cannot figure out how to generate a file like this "naturally"; when I add a textbox to a spreadsheet, it is stored as a regular xml file, not as vml, in both Excel 365 and 2007. Also, the text that I added in the vml file doesn't show anywhere when I open the file in Excel, so I don't know whether my file is a normal case for this condition. And, incidentally, when I load the file in PhpSpreadsheet and save it, the textbox data is not in the resulting output file; this is clearly not a problem in the particular case I constructed, but I don't know if that's true in general. If not, that would be a different problem than the one I'm fixing, but I would prefer to resolve them both at the same time.
1 parent 88bbac9 commit 8b63122

File tree

3 files changed

+29
-2
lines changed

3 files changed

+29
-2
lines changed

src/PhpSpreadsheet/Reader/Xlsx.php

+5-2
Original file line numberDiff line numberDiff line change
@@ -121,9 +121,12 @@ public static function falseToArray($value): array
121121
return is_array($value) ? $value : [];
122122
}
123123

124-
private function loadZip(string $filename, string $ns = ''): SimpleXMLElement
124+
private function loadZip(string $filename, string $ns = '', bool $replaceUnclosedBr = false): SimpleXMLElement
125125
{
126126
$contents = $this->getFromZipArchive($this->zip, $filename);
127+
if ($replaceUnclosedBr) {
128+
$contents = str_replace('<br>', '<br/>', $contents);
129+
}
127130
$rels = simplexml_load_string(
128131
$this->securityScanner->scan($contents),
129132
'SimpleXMLElement',
@@ -1030,7 +1033,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
10301033

10311034
try {
10321035
// no namespace okay - processed with Xpath
1033-
$vmlCommentsFile = $this->loadZip($relPath, '');
1036+
$vmlCommentsFile = $this->loadZip($relPath, '', true);
10341037
$vmlCommentsFile->registerXPathNamespace('v', Namespaces::URN_VML);
10351038
} catch (Throwable $ex) {
10361039
//Ignore unparsable vmlDrawings. Later they will be moved from $unparsedVmlDrawings to $unparsedLoadedData

tests/PhpSpreadsheetTests/Reader/Xlsx/ColorTabTest.php

+24
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,30 @@ public function testIssue2316(): void
1212
$filename = 'tests/data/Reader/XLSX/colortabs.xlsx';
1313
$reader = new Xlsx();
1414
$spreadsheet = $reader->load($filename);
15+
$unparsed = $spreadsheet->getUnparsedLoadedData();
16+
self::assertArrayHasKey('sheets', $unparsed);
17+
self::assertArrayHasKey('Worksheet', $unparsed['sheets']);
18+
self::assertArrayNotHasKey('vmlDrawings', $unparsed['sheets']['Worksheet']);
19+
20+
// theme color
21+
self::assertSame('FF548135', $spreadsheet->getSheet(0)->getTabColor()->getArgb());
22+
// rgb color
23+
self::assertSame('FFFFC000', $spreadsheet->getSheet(1)->getTabColor()->getArgb());
24+
$spreadsheet->disconnectWorksheets();
25+
}
26+
27+
public function testIssue3125(): void
28+
{
29+
// Very artificial - no real sample file to go with issue.
30+
$filename = 'tests/data/Reader/XLSX/colortabs.badbr.xlsx';
31+
$reader = new Xlsx();
32+
$spreadsheet = $reader->load($filename);
33+
$unparsed = $spreadsheet->getUnparsedLoadedData();
34+
self::assertArrayHasKey('sheets', $unparsed);
35+
self::assertArrayHasKey('Worksheet', $unparsed['sheets']);
36+
// Before fix, vml in sample file was unparseable as xml (unclosed br tag),
37+
// so condition below would fail.
38+
self::assertArrayNotHasKey('vmlDrawings', $unparsed['sheets']['Worksheet']);
1539

1640
// theme color
1741
self::assertSame('FF548135', $spreadsheet->getSheet(0)->getTabColor()->getArgb());
14.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)