Skip to content

Commit fa5f313

Browse files
authored
Merge pull request #1584 from chorman0773/spec-add-identifiers-input-format
Add spec identifier syntax to input-format.md
2 parents a928b00 + 2a7fbc9 commit fa5f313

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

src/input-format.md

+18-1
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,41 @@
11
# Input format
22

3+
r[input]
4+
5+
r[input.intro]
36
This chapter describes how a source file is interpreted as a sequence of tokens.
47

58
See [Crates and source files] for a description of how programs are organised into files.
69

710
## Source encoding
811

12+
r[input.encoding]
13+
14+
r[input.encoding.utf8]
915
Each source file is interpreted as a sequence of Unicode characters encoded in UTF-8.
16+
17+
r[input.encoding.invalid]
1018
It is an error if the file is not valid UTF-8.
1119

1220
## Byte order mark removal
1321

22+
r[input.byte-order-mark]
23+
1424
If the first character in the sequence is `U+FEFF` ([BYTE ORDER MARK]), it is removed.
1525

1626
## CRLF normalization
1727

28+
r[input.crlf]
29+
1830
Each pair of characters `U+000D` (CR) immediately followed by `U+000A` (LF) is replaced by a single `U+000A` (LF).
1931

2032
Other occurrences of the character `U+000D` (CR) are left in place (they are treated as [whitespace]).
2133

2234
## Shebang removal
2335

36+
r[input.shebang]
37+
38+
r[input.shebang.intro]
2439
If the remaining sequence begins with the characters `#!`, the characters up to and including the first `U+000A` (LF) are removed from the sequence.
2540

2641
For example, the first line of the following file would be ignored:
@@ -34,15 +49,17 @@ fn main() {
3449
}
3550
```
3651

52+
r[input.shebang.inner-attribute]
3753
As an exception, if the `#!` characters are followed (ignoring intervening [comments] or [whitespace]) by a `[` token, nothing is removed.
3854
This prevents an [inner attribute] at the start of a source file being removed.
3955

4056
> **Note**: The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not.
4157
4258
## Tokenization
4359

44-
The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter.
60+
r[input.tokenization]
4561

62+
The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter.
4663

4764
[inner attribute]: attributes.md
4865
[BYTE ORDER MARK]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

0 commit comments

Comments
 (0)