Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-130197: Test various encodings with pygettext #132244

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Apr 7, 2025

For context: #131902 (comment)

[...] We need this to test with and without --escape option. In particularly, to see what encoding is used in the POT file and in the header. In file with non-UTF-8 encoding use also characters not encodable with the source encoding (\uXXXX).

We currently set the charset of the POT file as the default encoding on the system (fp.encoding):

def write_pot_file(messages, options, fp):
timestamp = time.strftime('%Y-%m-%d %H:%M%z')
encoding = fp.encoding if fp.encoding else 'UTF-8'

To have reproducible tests regardless of the OS they are running on, we set -X utf8 in the tests. As a consequence, the POT charset is always set to utf-8. I don't think there's an easy way to control that if we want to test other output encodings.. At least with these tests we know that non-utf8 input files can be read correctly.

cc @serhiy-storchaka Let me know if this is what you had in mind for the tests!

@tomasr8 tomasr8 requested a review from erlend-aasland as a code owner April 7, 2025 22:01
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that duplicating this test with multiple encodings is needed. It is enough to test with one encoding -- and it should not be Latin1 or Windows-1252, which are often the default encoding. The CPU time can be spent on different tests.

Please add also non-ASCII comments.

Finally, we need to add tests for non-ASCII filenames on non-UTF-8 locale. I afraid that i18n_data cannot be used for this -- we need to try several locales with different encodings and generate an input file with corresponding name.

We need also to test the stderr output for files with non-ASCII file name and non-ASCII source encoding on non-UTF-8 locale. It contains a file name and may contain a fragment of the source text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants