Skip to content

ZipConstants.DefaultCodePage is set as UTF-8, but the default for FileEntry.IsUnicodeText is still false #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tabhuang opened this issue Jul 16, 2018 · 8 comments
Assignees

Comments

@tabhuang
Copy link

tabhuang commented Jul 16, 2018

Zip chinese filename, but I get filename with garbled text.
#NET45
#C Sharp
#v1.0.0 RC1
#ZipEntry
https://github.com/icsharpcode/SharpZipLib/releases

Test Chinese Filename
測試.txt

Zip Filename like "測試.txt", but I get filename "皜祈岫.txt".
2018-07-16_172858

Version 0.86.0.518 is OK, but 1.0.0 is wrong.

@tabhuang tabhuang reopened this Jul 16, 2018
@tabhuang tabhuang changed the title Zip Chinese Filename Zip Chinese Filename will error. Jul 16, 2018
@tabhuang tabhuang changed the title Zip Chinese Filename will error. Zip Chinese Filename Error. Jul 16, 2018
@piksel
Copy link
Member

piksel commented Jul 16, 2018

Could you provide a sample file that does not work? You can upload it here by dragging it into the textbox.

@tabhuang tabhuang changed the title Zip Chinese Filename Error. Zip chinese filename, but I get filename with garbled text . Jul 16, 2018
@cerasumat
Copy link

I got the same problem, my file directory contains chinese like '/mnt/img/汉字/1.png', the file in the zip file turned to '/mnt/img/xxx(messy code)/1.png'.
enviroment: centOS 7.1, .net core 2.1

@piksel
Copy link
Member

piksel commented Jul 18, 2018

Hm, I made a simple program to test this (https://gist.github.com/piksel/e4132290380d1744165e878a40d9c44f)

And it seems fine on .NET Core 2.1 on linux:

$ dotnet run
Downloading test file...
Zipping test dir...
Listing zip entries:
  Entry: 測試.txt

The file name is correct when opened in 7zip and windows explorer:
image

image

Extracting it with unzip v6.0 on linux does not give the correct file name though:

$ unzip -l -v test.zip
Archive:  test.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     306  Defl:N      249  19% 2018-07-18 17:54 19dbc344  ц╕мшйж.txt
--------          -------  ---                            -------
     306              249  19%                            1 file

Generated file: test.zip

@piksel piksel self-assigned this Jul 18, 2018
@piksel
Copy link
Member

piksel commented Jul 18, 2018

Update to @cerasumat:
Works if the correct encoding is specified:

$ unzip -l -v -O UTF-8 test.zip
Archive:  test.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     306  Defl:N      249  19% 2018-07-18 17:54 19dbc344  測試.txt
--------          -------  ---                            -------
     306              249  19%                            1 file

@shikelong
Copy link

How to fix it?

@tabhuang
Copy link
Author

tabhuang commented Jul 20, 2018

It's work for me.

Step1.
I try to edit the file [SharpZipLib-master\src\ICSharpCode.SharpZipLib\Zip\ZipConstants.cs]
ZipConstants.txt

Step2.
In my program, before use zip, insert code like this:
Encoding big5 = Encoding.GetEncoding("Big5");
ZipConstants.DefaultCodePage = big5.CodePage;

Result:
default

@piksel
Copy link
Member

piksel commented Jul 20, 2018

@tabhuang Yes, you can just use

ZipConstants.DefaultCodePage = Encoding.GetEncoding("Big5").CodePage;

To use a custom code page for the file names.

To use Unicode you have to set the ZipEntry.IsUnicodeText to true.
If you are using FastZip you can just set it on the EntryFactory:

fastZip.EntryFactory = new ZipEntryFactory
{
    IsUnicodeText = true
};

@piksel piksel changed the title Zip chinese filename, but I get filename with garbled text . ZipConstants.DefaultCodePage is set as UTF-8, but the default for FileEntry.IsUnicodeText is still false Jul 20, 2018
@piksel piksel added bug and removed question labels Jul 20, 2018
@piksel
Copy link
Member

piksel commented Jul 20, 2018

Turns out, this is a bug after all.

The default settings for file name encoding yields the wrong output.

@piksel piksel closed this as completed in fb9efd0 Jul 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants