Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing multibyte strings #3

Closed
middlebrain opened this issue Feb 27, 2018 · 4 comments
Closed

Writing multibyte strings #3

middlebrain opened this issue Feb 27, 2018 · 4 comments

Comments

@middlebrain
Copy link

The current implementation of writeString and maybe also writeProperty and writeKey has the problem, that the length (s:<length>) of multibyte strings is not written correctly. This results on the PHP side in unserialize() [function.unserialize]: Error at offset ...

Unfortunately, you must give the length of a serialized string in bytes instead of characters. However, the byte count depends on the character encoding.

When testing the serializer with a servlet I didn't immediately notice this, because the standard encoding of the output is ISO-8859-1 (one character is always one byte). Only when switching to UTF-8 (response. setCharacterEncoding("UTF-8") and using german umlauts the problem came up.

My instant fix for UTF-8 looks like this:

Writer.java:

...
import static java.nio.charset.StandardCharsets.UTF_8;
...
    public void writeString(String value)
    {
        setState(state.value());

        buffer.append("s:");
        buffer.append(value.getBytes(UTF_8).length);
        buffer.append(":\"");
        buffer.append(value);
        buffer.append("\";");
    }
...

It would certainly be better if the desired character encoding could be given to the writer.

Translated with www.DeepL.com/Translator.

@marcospassos
Copy link
Owner

marcospassos commented Feb 27, 2018

It's a bug indeed.

I'll make changes to the builder to allow specifying a default charset when creating the serializer. Also, I'll add a new parameter to serializeString allowing to define a custom charset as per value.

Please allow me a few days to release a new version with the fixes.

@marcospassos
Copy link
Owner

marcospassos commented Feb 28, 2018

I just released the version 0.6.0 that covers this issue. Could you please check if it covers your use case?

You can check the full release description here:
https://github.com/marcospassos/java-php-serializer/releases/tag/0.6.0

I've also updated the docs to reflect the new feature:
https://github.com/marcospassos/java-php-serializer

@middlebrain
Copy link
Author

It works like a charm now.

Thank you very match for your quick support.

@marcospassos
Copy link
Owner

Closing as the issue has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants