Skip to content

Commit 633db6f

Browse files
authored
Merge pull request #2364 from vsemozhetbyt/unicode
Replace unicode with Unicode all over the book
2 parents e87f130 + 7c73f64 commit 633db6f

File tree

6 files changed

+23
-23
lines changed

6 files changed

+23
-23
lines changed

1-js/05-data-types/03-string/3-truncate/solution.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The maximal length must be `maxlength`, so we need to cut it a little shorter, to give space for the ellipsis.
22

3-
Note that there is actually a single unicode character for an ellipsis. That's not three dots.
3+
Note that there is actually a single Unicode character for an ellipsis. That's not three dots.
44

55
```js run demo
66
function truncate(str, maxlength) {

1-js/05-data-types/03-string/article.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ let guestList = "Guests: // Error: Unexpected token ILLEGAL
5050

5151
Single and double quotes come from ancient times of language creation when the need for multiline strings was not taken into account. Backticks appeared much later and thus are more versatile.
5252

53-
Backticks also allow us to specify a "template function" before the first backtick. The syntax is: <code>func&#96;string&#96;</code>. The function `func` is called automatically, receives the string and embedded expressions and can process them. This is called "tagged templates". This feature makes it easier to implement custom templating, but is rarely used in practice. You can read more about it in the [manual](mdn:/JavaScript/Reference/Template_literals#Tagged_templates).
53+
Backticks also allow us to specify a "template function" before the first backtick. The syntax is: <code>func&#96;string&#96;</code>. The function `func` is called automatically, receives the string and embedded expressions and can process them. This is called "tagged templates". This feature makes it easier to implement custom templating, but is rarely used in practice. You can read more about it in the [manual](mdn:/JavaScript/Reference/Template_literals#Tagged_templates).
5454

5555
## Special characters
5656

@@ -86,16 +86,16 @@ Here's the full list:
8686
|`\\`|Backslash|
8787
|`\t`|Tab|
8888
|`\b`, `\f`, `\v`| Backspace, Form Feed, Vertical Tab -- kept for compatibility, not used nowadays. |
89-
|`\xXX`|Unicode character with the given hexadecimal unicode `XX`, e.g. `'\x7A'` is the same as `'z'`.|
90-
|`\uXXXX`|A unicode symbol with the hex code `XXXX` in UTF-16 encoding, for instance `\u00A9` -- is a unicode for the copyright symbol `©`. It must be exactly 4 hex digits. |
91-
|`\u{X…XXXXXX}` (1 to 6 hex characters)|A unicode symbol with the given UTF-32 encoding. Some rare characters are encoded with two unicode symbols, taking 4 bytes. This way we can insert long codes. |
89+
|`\xXX`|Unicode character with the given hexadecimal Unicode `XX`, e.g. `'\x7A'` is the same as `'z'`.|
90+
|`\uXXXX`|A Unicode symbol with the hex code `XXXX` in UTF-16 encoding, for instance `\u00A9` -- is a Unicode for the copyright symbol `©`. It must be exactly 4 hex digits. |
91+
|`\u{X…XXXXXX}` (1 to 6 hex characters)|A Unicode symbol with the given UTF-32 encoding. Some rare characters are encoded with two Unicode symbols, taking 4 bytes. This way we can insert long codes. |
9292

93-
Examples with unicode:
93+
Examples with Unicode:
9494

9595
```js run
9696
alert( "\u00A9" ); // ©
97-
alert( "\u{20331}" ); // 佫, a rare Chinese hieroglyph (long unicode)
98-
alert( "\u{1F60D}" ); // 😍, a smiling face symbol (another long unicode)
97+
alert( "\u{20331}" ); // 佫, a rare Chinese hieroglyph (long Unicode)
98+
alert( "\u{1F60D}" ); // 😍, a smiling face symbol (another long Unicode)
9999
```
100100

101101
All special characters start with a backslash character `\`. It is also called an "escape character".
@@ -499,7 +499,7 @@ All strings are encoded using [UTF-16](https://en.wikipedia.org/wiki/UTF-16). Th
499499
alert( String.fromCodePoint(90) ); // Z
500500
```
501501
502-
We can also add unicode characters by their codes using `\u` followed by the hex code:
502+
We can also add Unicode characters by their codes using `\u` followed by the hex code:
503503
504504
```js run
505505
// 90 is 5a in hexadecimal system
@@ -608,7 +608,7 @@ In many languages there are symbols that are composed of the base character with
608608

609609
For instance, the letter `a` can be the base character for: `àáâäãåā`. Most common "composite" character have their own code in the UTF-16 table. But not all of them, because there are too many possible combinations.
610610

611-
To support arbitrary compositions, UTF-16 allows us to use several unicode characters: the base character followed by one or many "mark" characters that "decorate" it.
611+
To support arbitrary compositions, UTF-16 allows us to use several Unicode characters: the base character followed by one or many "mark" characters that "decorate" it.
612612

613613
For instance, if we have `S` followed by the special "dot above" character (code `\u0307`), it is shown as Ṡ.
614614

@@ -626,7 +626,7 @@ For example:
626626
alert( 'S\u0307\u0323' ); // Ṩ
627627
```
628628

629-
This provides great flexibility, but also an interesting problem: two characters may visually look the same, but be represented with different unicode compositions.
629+
This provides great flexibility, but also an interesting problem: two characters may visually look the same, but be represented with different Unicode compositions.
630630

631631
For instance:
632632

@@ -639,7 +639,7 @@ alert( `s1: ${s1}, s2: ${s2}` );
639639
alert( s1 == s2 ); // false though the characters look identical (?!)
640640
```
641641

642-
To solve this, there exists a "unicode normalization" algorithm that brings each string to the single "normal" form.
642+
To solve this, there exists a "Unicode normalization" algorithm that brings each string to the single "normal" form.
643643

644644
It is implemented by [str.normalize()](mdn:js/String/normalize).
645645

@@ -663,7 +663,7 @@ If you want to learn more about normalization rules and variants -- they are des
663663

664664
- There are 3 types of quotes. Backticks allow a string to span multiple lines and embed expressions `${…}`.
665665
- Strings in JavaScript are encoded using UTF-16.
666-
- We can use special characters like `\n` and insert letters by their unicode using `\u...`.
666+
- We can use special characters like `\n` and insert letters by their Unicode using `\u...`.
667667
- To get a character, use: `[]`.
668668
- To get a substring, use: `slice` or `substring`.
669669
- To lowercase/uppercase a string, use: `toLowerCase/toUpperCase`.

4-binary/02-text-decoder/article.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ let decoder = new TextDecoder([label], [options]);
1212
- **`label`** -- the encoding, `utf-8` by default, but `big5`, `windows-1251` and many other are also supported.
1313
- **`options`** -- optional object:
1414
- **`fatal`** -- boolean, if `true` then throw an exception for invalid (non-decodable) characters, otherwise (default) replace them with character `\uFFFD`.
15-
- **`ignoreBOM`** -- boolean, if `true` then ignore BOM (an optional byte-order unicode mark), rarely needed.
15+
- **`ignoreBOM`** -- boolean, if `true` then ignore BOM (an optional byte-order Unicode mark), rarely needed.
1616

1717
...And then decode:
1818

9-regular-expressions/01-regexp-introduction/article.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ There are only 6 of them in JavaScript:
5656
: Enables "dotall" mode, that allows a dot `pattern:.` to match newline character `\n` (covered in the chapter <info:regexp-character-classes>).
5757

5858
`pattern:u`
59-
: Enables full unicode support. The flag enables correct processing of surrogate pairs. More about that in the chapter <info:regexp-unicode>.
59+
: Enables full Unicode support. The flag enables correct processing of surrogate pairs. More about that in the chapter <info:regexp-unicode>.
6060

6161
`pattern:y`
6262
: "Sticky" mode: searching at the exact position in the text (covered in the chapter <info:regexp-sticky>)

9-regular-expressions/03-regexp-unicode/article.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ JavaScript uses [Unicode encoding](https://en.wikipedia.org/wiki/Unicode) for st
44

55
That range is not big enough to encode all possible characters, that's why some rare characters are encoded with 4 bytes, for instance like `𝒳` (mathematical X) or `😄` (a smile), some hieroglyphs and so on.
66

7-
Here are the unicode values of some characters:
7+
Here are the Unicode values of some characters:
88

9-
| Character | Unicode | Bytes count in unicode |
9+
| Character | Unicode | Bytes count in Unicode |
1010
|------------|---------|--------|
1111
| a | `0x0061` | 2 |
1212
|| `0x2248` | 2 |
@@ -121,7 +121,7 @@ alert("number: xAF".match(regexp)); // xAF
121121

122122
Let's look for Chinese hieroglyphs.
123123

124-
There's a unicode property `Script` (a writing system), that may have a value: `Cyrillic`, `Greek`, `Arabic`, `Han` (Chinese) and so on, [here's the full list](https://en.wikipedia.org/wiki/Script_(Unicode)).
124+
There's a Unicode property `Script` (a writing system), that may have a value: `Cyrillic`, `Greek`, `Arabic`, `Han` (Chinese) and so on, [here's the full list](https://en.wikipedia.org/wiki/Script_(Unicode)).
125125

126126
To look for characters in a given writing system we should use `pattern:Script=<value>`, e.g. for Cyrillic letters: `pattern:\p{sc=Cyrillic}`, for Chinese hieroglyphs: `pattern:\p{sc=Han}`, and so on:
127127

@@ -135,7 +135,7 @@ alert( str.match(regexp) ); // 你,好
135135

136136
### Example: currency
137137

138-
Characters that denote a currency, such as `$`, ``, `¥`, have unicode property `pattern:\p{Currency_Symbol}`, the short alias: `pattern:\p{Sc}`.
138+
Characters that denote a currency, such as `$`, ``, `¥`, have Unicode property `pattern:\p{Currency_Symbol}`, the short alias: `pattern:\p{Sc}`.
139139

140140
Let's use it to look for prices in the format "currency, followed by a digit":
141141

9-regular-expressions/08-regexp-character-sets-and-ranges/article.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -57,16 +57,16 @@ For instance:
5757
5858
- **\d** -- is the same as `pattern:[0-9]`,
5959
- **\w** -- is the same as `pattern:[a-zA-Z0-9_]`,
60-
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]`, plus few other rare unicode space characters.
60+
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]`, plus few other rare Unicode space characters.
6161
```
6262

6363
### Example: multi-language \w
6464

6565
As the character class `pattern:\w` is a shorthand for `pattern:[a-zA-Z0-9_]`, it can't find Chinese hieroglyphs, Cyrillic letters, etc.
6666

67-
We can write a more universal pattern, that looks for wordly characters in any language. That's easy with unicode properties: `pattern:[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]`.
67+
We can write a more universal pattern, that looks for wordly characters in any language. That's easy with Unicode properties: `pattern:[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]`.
6868

69-
Let's decipher it. Similar to `pattern:\w`, we're making a set of our own that includes characters with following unicode properties:
69+
Let's decipher it. Similar to `pattern:\w`, we're making a set of our own that includes characters with following Unicode properties:
7070

7171
- `Alphabetic` (`Alpha`) - for letters,
7272
- `Mark` (`M`) - for accents,
@@ -85,7 +85,7 @@ let str = `Hi 你好 12`;
8585
alert( str.match(regexp) ); // H,i,你,好,1,2
8686
```
8787

88-
Of course, we can edit this pattern: add unicode properties or remove them. Unicode properties are covered in more details in the article <info:regexp-unicode>.
88+
Of course, we can edit this pattern: add Unicode properties or remove them. Unicode properties are covered in more details in the article <info:regexp-unicode>.
8989

9090
```warn header="Unicode properties aren't supported in Edge and Firefox"
9191
Unicode properties `pattern:p{…}` are not yet implemented in Edge and Firefox. If we really need them, we can use library [XRegExp](http://xregexp.com/).

0 commit comments

Comments
 (0)