Add alias for iso-8859-8-i which is the same as iso-8859-8 #62824

bitdancer · 2013-08-01T23:04:29Z

BPO	18624
Nosy	@malemburg, @warsaw, @ezio-melotti, @bitdancer, @ringof
PRs	bpo-25416: add aliases for cp874 and mac_cyrillic encodings #10237 gh-62824: add alias for iso-8859-8-i and -e to iso_8859_8 #32279
Files	adding_aliases.patch: adding aliases to the iso-8859-8. 8859-8_aliases_and_test.patch: added two aliases to 8859-8, commented out a missing tactis codec, added a test

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2013-08-01.23:04:29.016>
labels = ['easy', 'type-feature', 'expert-email', 'expert-unicode']
title = 'Add alias for iso-8859-8-i which is the same as iso-8859-8'
updated_at = <Date 2022-04-03.02:54:46.394>
user = 'https://github.com/bitdancer'

bugs.python.org fields:

activity = <Date 2022-04-03.02:54:46.394>
actor = 'dpg'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Unicode', 'email']
creation = <Date 2013-08-01.23:04:29.016>
creator = 'r.david.murray'
dependencies = []
files = ['34449', '35736']
hgrepos = []
issue_num = 18624
keywords = ['patch', 'easy', 'needs review']
message_count = 11.0
messages = ['194134', '194165', '194177', '194267', '194362', '194386', '213509', '213765', '213772', '213773', '221330']
nosy_count = 9.0
nosy_names = ['lemburg', 'barry', 'ezio.melotti', 'r.david.murray', 'das', 'kamie', 'mvolz', 'bensws', 'dpg']
pr_nums = ['10237', '32279']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue18624'
versions = ['Python 3.4']

Linked PRs

bitdancer · 2013-08-01T23:04:29Z

Emails and web pages may specify a character set of iso-8859-8-i, which has exactly the same code points as iso-8859-8. The -i has to do with how bi-directional text is handled, but doesn't affect the encoding: http://lists.w3.org/Archives/Public/www-validator/2001Apr/0008.html

malemburg · 2013-08-02T08:37:18Z

Here's a usable reference:

http://www.w3.org/TR/html4/struct/dirlang.html#bidi88598

+1 on adding the alias.

Also see

http://lists.gnu.org/archive/html/lynx-dev/2012-02/msg00041.html

for how Lynx does this.

The URL also mentions "iso-8859-8-e", which should probably also be aliased to "iso-8859-8". Both names only apply to visual display characteristics of the text; the encoding is the same.

bitdancer · 2013-08-02T14:37:42Z

I got the impression from what I read that -e included additional control sequences, but perhaps I misunderstood and that only meant that the data stream was expected to *use* additional control sequences but the control codes themselves are part of the base codec?

I'm specifically thinking of this statement from the linked reference:

"Because HTML uses the Unicode bidirectionality algorithm, conforming documents encoded using ISO 8859-8 must be labeled as "ISO-8859-8-i". Explicit directional control is also possible with HTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used."

The "cannot be expressed" seems to imply there are differences in the codec.

malemburg · 2013-08-03T15:33:59Z

On 02.08.2013 16:37, R. David Murray wrote:

I got the impression from what I read that -e included additional control sequences, but perhaps I misunderstood and that only meant that the data stream was expected to *use* additional control sequences but the control codes themselves are part of the base codec?

I'm specifically thinking of this statement from the linked reference:

"Because HTML uses the Unicode bidirectionality algorithm, conforming documents encoded using ISO 8859-8 must be labeled as "ISO-8859-8-i". Explicit directional control is also possible with HTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used."

The "cannot be expressed" seems to imply there are differences in the codec.

No, not really. After some more research, I found that the -i and
-e suffixes are defined in RFC 1556:

http://tools.ietf.org/html/rfc1556

At the codec level, these encodings are all the same. The suffixes
define whether or not to interpret some of their control characters
with respect to bidi text when visualizing the text.

das · 2013-08-04T12:51:47Z

Is it satisfactory to just add the -i and -e variants to ALIASES in charset.py? Or don't they qualify as "Aliases for other commonly-used names for character sets"?

bitdancer · 2013-08-04T15:50:04Z

This issue is actually about adding the aliases to the codecs module. I'm not entirely sure at this point what the canonical character set name should be for email output (which is what the ALIASES table controls).

kamie · 2014-03-14T01:42:51Z

I'm not sure about how the aliases are represented. I found some examples:

http://web.mit.edu/Mozilla/src/mozilla/intl/uconv/src/charsetalias.properties

So I wrote the aliases like this:

'iso-8859-8-i' : 'iso8859_8_I',
'iso-8859-8-e' : 'iso8859_8_E',

But I'm not sure if I should write as shown in the example above or if it should looks like:

'iso-8859-8-i' : 'iso8859_8',
'iso-8859-8-e' : 'iso8859_8',

And how about the tests? I couldn't locate the tests for this module. It it the tests inside the enconded_modules folder?

kamie · 2014-03-16T22:19:16Z

Adding aliases to the set of iso-8859-8.

bitdancer · 2014-03-16T23:17:18Z

From python's point of view they are both aliases of iso-8859_8, as discussed in this issue. Python does not have iso-8859_8-e and i codecs, which you changes to the alias table implies that it does (the target of the entry in the aliases table is the python codec name...and there is only iso8859_8.py, not iso8859_8_E.py or _I.py).

bitdancer · 2014-03-16T23:22:00Z

The tests are in test_encodings.py. It is interesting that the tests pass with your patch applied; that indicates that there is a missing test, since we should be testing that all of the values in the aliases table are the names of existing codecs, and apparently we aren't.

bensws · 2014-06-23T00:49:16Z

Added a patch with these two 8859-8 aliases and a corresponding test in test_codecs.py (couldn't find test_encodings.py mentioned in an earlier message). The test also found a missing 'tactis' codec (bpo-1251921), so I've commented it out in the aliases.py file. Please take a look.

…h-134306) Co-authored-by: David Goncalves <[email protected]> Co-authored-by: Oleg Iarygin <[email protected]>

…859-8 (pythongh-134306) (cherry picked from commit 5ab66a8) Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: David Goncalves <[email protected]> Co-authored-by: Oleg Iarygin <[email protected]>

…8859-8 (gh-134306) (gh-134330) (cherry picked from commit 5ab66a8) Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: David Goncalves <[email protected]> Co-authored-by: Oleg Iarygin <[email protected]>

…dec files In Fedora, we install many codecs as .pyc files to save space. This test was failing when running from installed Python: ====================================================================== FAIL: test_alias_modules_exist (test.test_codecs.TransformCodecTest.test_alias_modules_exist) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.14/test/test_codecs.py", line 3115, in test_alias_modules_exist self.assertTrue(os.path.isfile(codec_file), ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ "Codec file not found: " + codec_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: False is not true : Codec file not found: /usr/lib64/python3.14/encodings/cp037.py ----------------------------------------------------------------------

… of file checks (#134777)

…nstead of file checks (pythonGH-134777) (cherry picked from commit 8704d6b) Co-authored-by: Miro Hrončok <[email protected]>

…instead of file checks (GH-134777) (GH-134781) gh-62824: Adjust test_alias_modules_exist test to use imports instead of file checks (GH-134777) (cherry picked from commit 8704d6b) Co-authored-by: Miro Hrončok <[email protected]>

bitdancer added topic-unicode topic-email easy type-feature A feature request or enhancement labels Aug 1, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka added this to Codecs and encodings issues May 22, 2022

bedevere-bot mentioned this issue Feb 13, 2023

gh-62824: add alias for iso-8859-8-i and -e to iso_8859_8 #32279

Closed

iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023

bedevere-app bot mentioned this issue May 20, 2025

gh-62824: Add alias for iso-8859-8-i which is the same as iso-8859-8 #134306

Merged

ambv pushed a commit that referenced this issue May 20, 2025

gh-62824: Add alias for iso-8859-8-i which is the same as iso-8859-8 (g…

5ab66a8

…h-134306) Co-authored-by: David Goncalves <[email protected]> Co-authored-by: Oleg Iarygin <[email protected]>

bedevere-app bot mentioned this issue May 20, 2025

[3.14] gh-62824: Add alias for iso-8859-8-i which is the same as iso-8859-8 (gh-134306) #134330

Merged

ambv closed this as completed May 20, 2025

github-project-automation bot moved this to Done in Codecs and encodings issues May 20, 2025

bedevere-app bot mentioned this issue May 27, 2025

gh-62824: Adjust test_alias_modules_exist test to allow .pyc codec files #134777

Merged

malemburg pushed a commit that referenced this issue May 27, 2025

gh-62824: Adjust test_alias_modules_exist test to use imports instead…

8704d6b

… of file checks (#134777)

bedevere-app bot mentioned this issue May 27, 2025

[3.14] gh-62824: Adjust test_alias_modules_exist test to use imports instead of file checks (GH-134777) #134781

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add alias for iso-8859-8-i which is the same as iso-8859-8 #62824

Add alias for iso-8859-8-i which is the same as iso-8859-8 #62824

bitdancer commented Aug 1, 2013 •

edited by bedevere-app bot

Loading

bitdancer commented Aug 1, 2013

Uh oh!

malemburg commented Aug 2, 2013

Uh oh!

bitdancer commented Aug 2, 2013

Uh oh!

malemburg commented Aug 3, 2013

Uh oh!

das mannequin commented Aug 4, 2013

Uh oh!

bitdancer commented Aug 4, 2013

Uh oh!

kamie mannequin commented Mar 14, 2014

Uh oh!

kamie mannequin commented Mar 16, 2014

Uh oh!

bitdancer commented Mar 16, 2014

Uh oh!

bitdancer commented Mar 16, 2014

Uh oh!

bensws mannequin commented Jun 23, 2014

Uh oh!

Uh oh!

Add alias for iso-8859-8-i which is the same as iso-8859-8 #62824

Add alias for iso-8859-8-i which is the same as iso-8859-8 #62824

Comments

bitdancer commented Aug 1, 2013 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked PRs

bitdancer commented Aug 1, 2013

Uh oh!

malemburg commented Aug 2, 2013

Uh oh!

bitdancer commented Aug 2, 2013

Uh oh!

malemburg commented Aug 3, 2013

Uh oh!

das mannequin commented Aug 4, 2013

Uh oh!

bitdancer commented Aug 4, 2013

Uh oh!

kamie mannequin commented Mar 14, 2014

Uh oh!

kamie mannequin commented Mar 16, 2014

Uh oh!

bitdancer commented Mar 16, 2014

Uh oh!

bitdancer commented Mar 16, 2014

Uh oh!

bensws mannequin commented Jun 23, 2014

Uh oh!

bitdancer commented Aug 1, 2013 •

edited by bedevere-app bot

Loading