Skip to content

v2.0.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader setting default_charset and mb_internal_encoding to "UTF-8" #4022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cmanley opened this issue May 9, 2024 · 2 comments

Comments

@cmanley
Copy link

cmanley commented May 9, 2024

Update, TL;DR;
I just found a recent unreleased commit that fixed the issue I'm having:
f7cf378
which lead me to it's ticket:
#3957
which lead me to another ticket having similar problems with the voku dependency:
#3954
which makes me wish composer had a package blacklist feature to prevent rogue packages from creeping in as dependencies.

This is:

- [* ] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

That no packages change global settings of any kind.

What is the current behavior?

That mb_internal_encoding is being set to UTF-8 by dependency voku/portable-utf8/src/voku/helper/UTF8.php

What are the steps to reproduce?

Set mb_internal_encoding() to anything but UTF-8, let the composer autoloader do it's thing, and then check mb_internal_encoding().
It should still be set to what you set it to, but you'll see that it has been changed to UTF-8.
You don't even have to load/use any PhpOffice code first.

<?php

mb_internal_encoding('cp1252');
error_log('before autoload: ' . mb_internal_encoding());
require __DIR__ . '/vendor/autoload.php';
error_log('after autoload: ' . mb_internal_encoding());

# what you actually see:
# before autoload: Windows-1252
# after autoload: UTF-8

# what you should see:
# before autoload: Windows-1252
# after autoload: Windows-1252

If this is an issue with reading a specific spreadsheet file, then it may be appropriate to provide a sample file that demonstrates the problem; but please keep it as small as possible, and sanitize any confidential information before uploading.

What features do you think are causing the issue

The composer autoloader calls voku/portable-utf8/bootstrap.php, which in turn calls \voku\helper\UTF8::checkForSupport(); which calls mb_internal_encoding('UTF-8'); Also \voku\helper\Bootstrap.php performs an ini_set('default_charset', 'UTF-8') call.

To be clear, one doesn't even have to call any PhpOffice or voku code to experience this issue. Having either phpoffice/phpspreadsheet or voku/portable-utf8 present in composer.json is sufficient.

There is only 1 line in the whole phpoffice/phpspreadsheet project that requires the large voku dependency for cleaning xss from text when writing a worksheet comment. Surely there is a better way to filter xss via one of the many other html / xss filtering packages. Or if it were up to me, just leave it up to the responsibility of the caller to clean xss if and how they want to. I think that's the more hassle-free approach.

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Yes and it affects much more than that too: the whole application and all data passing through it.

Which versions of PhpSpreadsheet and PHP are affected?

2.0.0

Background info

Recently after upgrading some packages, some database data corruption issues started popping up in an application that had been running perfectly fine for ages.
The application initializes various global settings at start up, such as the timezone, mb_internal_encoding(), the ini setting default_charset, etc. The database connection too sets the connection encoding and collation appropriately based on the mb_internal_encoding().
After spending time debugging I discovered that the database connection charset was being set to UTF-8, instead of latin1.
Then I found the culprits: voku/portable-utf8/src/voku/helper/UTF8.php and voku/portable-utf8/src/voku/helper/Boostrap.php
They blatantly call mb_internal_encoding('UTF-8') and ini_set('default_charset', 'UTF-8') during the autoloader phase.
That is really bad practice imho, so I'd recommend removing/replacing that dependency all together. I just can't trust a package like that. Who know's what other global settings it changes.

I have no choice now but to roll back all updates of phpoffice/phpspreadsheet 2.0 to the latest 1.* in every single application that I've updated.
I could alternatively call mb_internal_encoding() again after calling the autoloader, but that could lead to unpredictable functionality in both voku/portable-utf8 and phpoffice/phpspreadsheet.

@cmanley cmanley changed the title v2.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 which blatantly calls mb_internal_encoding("UTF-8"). v2.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader calling mb_internal_encoding("UTF-8"). May 9, 2024
@cmanley cmanley changed the title v2.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader calling mb_internal_encoding("UTF-8"). v2.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader setting default_charset and mb_internal_encoding to "UTF-8" May 9, 2024
@oleibman
Copy link
Collaborator

oleibman commented May 9, 2024

As you've noted, the dependency has been eliminated, for the reasons you describe. Is there more that you require?

@cmanley
Copy link
Author

cmanley commented May 9, 2024

@oleibman Thanks. Just a release with your commit :)

@cmanley cmanley closed this as completed May 9, 2024
@cmanley cmanley changed the title v2.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader setting default_charset and mb_internal_encoding to "UTF-8" v2.0.0 data corruption in non-UTF-8 apps as a result of dependency voku/portable-utf8 autoloader setting default_charset and mb_internal_encoding to "UTF-8" May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants