Skip to content

Reader\Xls default codepage causing character loss #1732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ingcharlie opened this issue Nov 25, 2020 · 1 comment
Closed

Reader\Xls default codepage causing character loss #1732

ingcharlie opened this issue Nov 25, 2020 · 1 comment

Comments

@ingcharlie
Copy link

This is:

- [*] a bug report
- [*] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

The ability to set default codepage for Reader\Xls

All hardcoded text 'CP1252' should be change to a default variable.
E.g.:
/**
* Default Code Page
*
* @var string
*/
private $defaultCodePage='CP1252';

    /**
 * @param string $defaultCodePage
 */
public function setDefaultCodePage($defaultCodePage)
{
	$this->defaultCodePage = $defaultCodePage;
}

'CP1252' -> $this->defaultCodePage;

in function load:
$this->codepage = $this->defaultCodePage;

in function readSummaryInformation:
$codePage = $this->defaultCodePage;

in function readDocumentSummaryInformation:
$codePage = $this->defaultCodePage;

usage:
$inputFileName='test.xls';
$reader= IOFactory::createReaderForFile($inputFileName);
$reader->setDefaultCodePage('CP1250');
$spreadsheet=$reader->load($inputFileName);

What is the current behavior?

The default charset is CP1252 and if the source file is in CP1250 the iconv drops characters which exist in CP1250 but don't in CP1252, e.g CP1250(https://cs.wikipedia.org/wiki/Windows-1250) character "ť" 0x9D (157) doesn't exist in CP1252 (https://cs.wikipedia.org/wiki/Windows-1252) Xls->decodeCodepage('Hmotnosť') returns 'Hmotnos'

What are the steps to reproduce?

Read a file in cp1252 with text 'Hmotnosť' it will be imported as 'Hmotnos'

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php

require __DIR__ . '/vendor/autoload.php';

// Create new Spreadsheet object
$spreadsheet = new \PhpOffice\PhpSpreadsheet\Spreadsheet();

// add code that show the issue here...

Which versions of PhpSpreadsheet and PHP are affected?

All

@oleibman
Copy link
Collaborator

This was resolved by PR #1484 over 2 years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants