@@ -10,6 +10,18 @@ Markup Shorthands: css off
10
10
Translate IDs : dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index
11
11
</pre>
12
12
13
+ <pre class=biblio>
14
+ {
15
+ "ISO8859-1": {
16
+ "href": "https://www.iso.org/standard/28245.html",
17
+ "title": "Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1",
18
+ "publisher": "International Organization for Standardization (ISO)",
19
+ "status": "Published",
20
+ "date": "April 1998"
21
+ }
22
+ }
23
+ </pre>
24
+
13
25
<link rel=stylesheet href=visualization-colors.css>
14
26
15
27
@@ -592,7 +604,10 @@ prescribes, as that is necessary to be compatible with deployed content.
592
604
<tr><td> "<code> windows-1251</code> "
593
605
<tr><td> "<code> x-cp1251</code> "
594
606
<tr>
595
- <td rowspan=17> <a>windows-1252</a>
607
+ <td rowspan=17>
608
+ <a>windows-1252</a>
609
+ <p class=note> See <a href="#note-latin1-ascii">below</a> for the relationship to historical
610
+ "Latin1" and "ASCII" concepts.
596
611
<td> "<code> ansi_x3.4-1968</code> "
597
612
<tr><td> "<code> ascii</code> "
598
613
<tr><td> "<code> cp1252</code> "
@@ -756,6 +771,29 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a
756
771
and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
757
772
plans to remove these.</p>
758
773
774
+ <div class=note id=note-latin1-ascii>
775
+ <p> The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> , such as
776
+ "<code> latin1</code> ", "<code> iso-8859-1</code> ", and "<code> ascii</code> ", which have historically
777
+ been confusing for developers. On the web, and in any software that seeks to be web-compatible by
778
+ implementing this standard, these are synonyms: "<code> latin1</code> " and "<code> ascii</code> " are
779
+ just labels for <a>windows-1252</a> , and any software following this standard will, for example,
780
+ decode 0x80 as U+20AC (€) when asked for the "Latin1" or "ASCII" decoding of that byte.
781
+
782
+ <p> Software that does not follow this standard does not always give the same answers. The root of
783
+ this is that the original document that specified Latin1 (ISO/IEC 8859-1) did not provide any
784
+ mappings for bytes in the inclusive ranges 0x00 to 0x1F or 0x7F to 0x9F. Similarly, the original
785
+ documents that specified ASCII (ISO/IEC 646, among others) did not provide any mappings for bytes
786
+ in the inclusive range 0x80 to 0xFF. This means different software has chosen different code point
787
+ mappings for those bytes when asked to use Latin1 or ASCII encodings. Web browsers and
788
+ browser-compatible software have chosen to map those bytes according to <a>windows-1252</a> , which
789
+ is a superset of both, and this choice was codified in this standard. Other software throws errors,
790
+ or uses <a>isomorphic decoding</a> , or other mappings. [[ISO8859-1]] [[ISO646]]
791
+
792
+ <p> As such, implementers and developers need to be careful whenever they are using libraries which
793
+ expose APIs in terms of "Latin1" or "ASCII". It's very possible such libraries will not give
794
+ answers in line with this standard, if they have chosen other behaviors for the bytes which were
795
+ left undefined in the original specifications.
796
+ </div>
759
797
760
798
<h3 id=output-encodings>Output encodings</h3>
761
799
0 commit comments