Skip to content

Commit 36fb4e7

Browse files
authored
Explain the relationship between windows-1252, Latin1, and ASCII
1 parent 82a4dfa commit 36fb4e7

File tree

2 files changed

+47
-2
lines changed

2 files changed

+47
-2
lines changed

encoding.bs

+39-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@ Markup Shorthands: css off
1010
Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index
1111
</pre>
1212

13+
<pre class=biblio>
14+
{
15+
"ISO8859-1": {
16+
"href": "https://www.iso.org/standard/28245.html",
17+
"title": "Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1",
18+
"publisher": "International Organization for Standardization (ISO)",
19+
"status": "Published",
20+
"date": "April 1998"
21+
}
22+
}
23+
</pre>
24+
1325
<link rel=stylesheet href=visualization-colors.css>
1426

1527

@@ -592,7 +604,10 @@ prescribes, as that is necessary to be compatible with deployed content.
592604
<tr><td>"<code>windows-1251</code>"
593605
<tr><td>"<code>x-cp1251</code>"
594606
<tr>
595-
<td rowspan=17><a>windows-1252</a>
607+
<td rowspan=17>
608+
<a>windows-1252</a>
609+
<p class=note>See <a href="#note-latin1-ascii">below</a> for the relationship to historical
610+
"Latin1" and "ASCII" concepts.
596611
<td>"<code>ansi_x3.4-1968</code>"
597612
<tr><td>"<code>ascii</code>"
598613
<tr><td>"<code>cp1252</code>"
@@ -756,6 +771,29 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a
756771
and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
757772
plans to remove these.</p>
758773

774+
<div class=note id=note-latin1-ascii>
775+
<p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a>, such as
776+
"<code>latin1</code>", "<code>iso-8859-1</code>", and "<code>ascii</code>", which have historically
777+
been confusing for developers. On the web, and in any software that seeks to be web-compatible by
778+
implementing this standard, these are synonyms: "<code>latin1</code>" and "<code>ascii</code>" are
779+
just labels for <a>windows-1252</a>, and any software following this standard will, for example,
780+
decode 0x80 as U+20AC (€) when asked for the "Latin1" or "ASCII" decoding of that byte.
781+
782+
<p>Software that does not follow this standard does not always give the same answers. The root of
783+
this is that the original document that specified Latin1 (ISO/IEC 8859-1) did not provide any
784+
mappings for bytes in the inclusive ranges 0x00 to 0x1F or 0x7F to 0x9F. Similarly, the original
785+
documents that specified ASCII (ISO/IEC 646, among others) did not provide any mappings for bytes
786+
in the inclusive range 0x80 to 0xFF. This means different software has chosen different code point
787+
mappings for those bytes when asked to use Latin1 or ASCII encodings. Web browsers and
788+
browser-compatible software have chosen to map those bytes according to <a>windows-1252</a>, which
789+
is a superset of both, and this choice was codified in this standard. Other software throws errors,
790+
or uses <a>isomorphic decoding</a>, or other mappings. [[ISO8859-1]] [[ISO646]]
791+
792+
<p>As such, implementers and developers need to be careful whenever they are using libraries which
793+
expose APIs in terms of "Latin1" or "ASCII". It's very possible such libraries will not give
794+
answers in line with this standard, if they have chosen other behaviors for the bytes which were
795+
left undefined in the original specifications.
796+
</div>
759797

760798
<h3 id=output-encodings>Output encodings</h3>
761799

tools-label-table.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@ def create_table():
1414
if label_len > 1:
1515
rowspan = " rowspan=" + str(label_len)
1616

17-
table += " <tr>\n <td" + rowspan + "><a>" + encoding["name"] + "</a>"
17+
if encoding["name"] != "windows-1252":
18+
table += " <tr>\n <td" + rowspan + "><a>" + encoding["name"] + "</a>"
19+
else:
20+
table += f""" <tr>
21+
<td{rowspan}>
22+
<a>{encoding["name"]}</a>
23+
<p class=note>See <a href="#note-latin1-ascii">below</a> for the relationship to historical
24+
"Latin1" and "ASCII" concepts."""
1825
i = 0
1926
for label in encoding["labels"]:
2027
if i > 0:

0 commit comments

Comments
 (0)