gh-130197: Test various encodings with pygettext #132244

tomasr8 · 2025-04-07T21:39:05Z

For context: #131902 (comment)

[...] We need this to test with and without --escape option. In particularly, to see what encoding is used in the POT file and in the header. In file with non-UTF-8 encoding use also characters not encodable with the source encoding (\uXXXX).

We currently set the charset of the POT file as the default encoding on the system (fp.encoding):

cpython/Tools/i18n/pygettext.py

Lines 574 to 576 in f5639d8

    
           def write_pot_file(messages, options, fp): 
        
               timestamp = time.strftime('%Y-%m-%d %H:%M%z') 
        
               encoding = fp.encoding if fp.encoding else 'UTF-8'

To have reproducible tests regardless of the OS they are running on, we set -X utf8 in the tests. As a consequence, the POT charset is always set to utf-8. I don't think there's an easy way to control that if we want to test other output encodings.. At least with these tests we know that non-utf8 input files can be read correctly.

cc @serhiy-storchaka Let me know if this is what you had in mind for the tests!

Issue: pygettext: Clean up obsolete tests and improve test coverage #130197

serhiy-storchaka

I do not think that duplicating this test with multiple encodings is needed. It is enough to test with one encoding -- and it should not be Latin1 or Windows-1252, which are often the default encoding. The CPU time can be spent on different tests.

Please add also non-ASCII comments.

Finally, we need to add tests for non-ASCII filenames on non-UTF-8 locale. I afraid that i18n_data cannot be used for this -- we need to try several locales with different encodings and generate an input file with corresponding name.

We need also to test the stderr output for files with non-ASCII file name and non-ASCII source encoding on non-UTF-8 locale. It contains a file name and may contain a fragment of the source text.

Test various encodings with pygettext

e28559c

tomasr8 added the skip news label Apr 7, 2025

bedevere-app bot added the awaiting review label Apr 7, 2025

bedevere-app bot mentioned this pull request Apr 7, 2025

pygettext: Clean up obsolete tests and improve test coverage #130197

Open

Update test subdirs

e12bb90

tomasr8 requested a review from erlend-aasland as a code owner April 7, 2025 22:01

serhiy-storchaka reviewed Apr 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-130197: Test various encodings with pygettext #132244

gh-130197: Test various encodings with pygettext #132244

tomasr8 commented Apr 7, 2025 •

edited by bedevere-app bot

Loading

serhiy-storchaka left a comment

	def write_pot_file(messages, options, fp):
	timestamp = time.strftime('%Y-%m-%d %H:%M%z')
	encoding = fp.encoding if fp.encoding else 'UTF-8'

gh-130197: Test various encodings with pygettext #132244

Are you sure you want to change the base?

gh-130197: Test various encodings with pygettext #132244

Conversation

tomasr8 commented Apr 7, 2025 • edited by bedevere-app bot Loading

serhiy-storchaka left a comment

Choose a reason for hiding this comment

tomasr8 commented Apr 7, 2025 •

edited by bedevere-app bot

Loading