Skip to content

Commit fde2301

Browse files
authored
Merge pull request swiftlang#160 from dabelknap/pretty-printer-doc
Documentation: Introduce printer algorithm and describe token types
2 parents ea6b2e8 + 9091dfd commit fde2301

File tree

1 file changed

+260
-0
lines changed

1 file changed

+260
-0
lines changed

Documentation/PrettyPrinter.md

+260
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# SwiftFormat Pretty Printer
2+
3+
## Introduction
4+
5+
The algorithm used in the SwiftFormat pretty printer is based on the "simple"
6+
version of the algorithm described by Derek Oppen in his paper [*Pretty
7+
Printing*](http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf)
8+
(1979). It employs two functions: *scan* and *print*. The *scan* function
9+
accepts a stream of tokens and calculates the lengths of these tokens. It then
10+
passes the tokens and their computed lengths to *print*, which handles the
11+
actual printing of the tokens, automatically inserting line breaks and indents
12+
to obey a given maximum line length. We describe in detail how these functions
13+
have been implemented in SwiftFormat.
14+
15+
## Tokens
16+
17+
### Token Groups
18+
19+
It is often necessary to group a series of tokens together into logical groups
20+
that we want to avoid splitting with line break if possible. The algorithm tries
21+
to break as few groups as possible when printing. Groups begin with *open*
22+
tokens and end with *close* tokens. These tokens must always be paired.
23+
24+
### Token Types
25+
26+
The different types of tokens are represented as a Token `enum` within the code.
27+
The available cases are: `syntax`, `break`, `open`, `close`, `newlines`,
28+
`comment`, and `reset`. The behavior of each of them is described below with
29+
pseudocode examples.
30+
31+
See: [`Token.swift`](../Sources/SwiftFormatPrettyPrint/Token.swift)
32+
33+
#### Syntax
34+
35+
The *syntax* tokens contain the segments of text that need to be printed (e.g.
36+
`}`, `func`, `23`, `while`, etc.). The length of a token is the number of
37+
columns needed to print it. For example, `func` would have a length of 4.
38+
39+
#### Break
40+
41+
The *break* tokens indicate where line breaks are allowed to occur. These
42+
frequently occur as the whitespace in between syntax tokens. The breaks contain
43+
two associated values that can be specified when creating the break token:
44+
*size* and *offset*. The size indicates how many columns of whitespace should
45+
be printed when the token is encountered. If a line break should occur at the
46+
break token, the offset indicates how many spaces should be used for indentation
47+
of the next token. The length of a break is its size plus the length of the
48+
token that immediately come after it. If a break immediately precedes a group,
49+
its length will be its size plus the size of the group.
50+
51+
```
52+
# break(size, offset)
53+
Tokens = ["one", break(1, 2), "two", break(1, 2), "three"]
54+
Lengths = [3, 4, 3, 6, 5]
55+
56+
# Maximum line length of 10
57+
Output =
58+
"""
59+
one two
60+
three
61+
"""
62+
```
63+
64+
#### Open
65+
66+
An *open* token indicates the start of a group.
67+
68+
```
69+
# break(size=1, offset=0)
70+
Token = ["one", break, open, "two", break, "three", break, open, "four", break, "five", close, close]
71+
72+
# Maximum line length of 20
73+
Output =
74+
"""
75+
one
76+
two three four five
77+
"""
78+
79+
# Maximum line length of 10
80+
Output =
81+
"""
82+
one
83+
two three
84+
four five
85+
"""
86+
```
87+
88+
Open tokens have a *break style* and an *offset*. The break style is either
89+
*consistent* or *inconsistent*. If a group is too large to fit on the remaining
90+
space on a line, and it is labeled as *consistent*, then the break tokens it
91+
contains will all produce line breaks. (In the case of nested groups, the break
92+
style affects a group's immediate children.) The default behavior is
93+
*inconsistent*, in which case the break tokens only produce line breaks when
94+
their lengths exceed the remaining space on the line.
95+
96+
```
97+
# open(consistent/inconsistent), break(size, offset)
98+
Tokens = ["one", break(1, 0), open(C), "two", break(1, 0), "three", close]
99+
100+
# Maximum line length of 10 (consistent breaking)
101+
Output =
102+
"""
103+
one
104+
two
105+
three
106+
"""
107+
108+
# With inconsistent breaking
109+
Tokens = ["one", break(1, 0), open(I), "two", break(1, 0), "three", close]
110+
Output =
111+
"""
112+
one
113+
two three
114+
"""
115+
```
116+
117+
The open token's offset applies an offset to the breaks contained within the
118+
group. A break token's offset value is added to the offset of its group. In the
119+
case of nested groups, the group offsets add together. If an outer group has an
120+
offset of 2, and an inner group an offset 3, any break tokens that produce line
121+
breaks in the inner group will offset by 5 spaces (plus the break's offsets).
122+
Additionally, a break that produces a line break immediately before an open
123+
token will also increase the offset. For example, if a break has an offset of 2
124+
immediately before an open with an offset of 3, the breaks within the group will
125+
be offset by 5.
126+
127+
```
128+
# open(consistent/inconsistent, offset)
129+
Tokens = ["one", break, open(C, 2), "two", break, "three", close]
130+
131+
# Maximum line length of 10
132+
Output =
133+
"""
134+
one
135+
two
136+
three
137+
"""
138+
139+
Tokens = ["one", break(offset=2), open(C, 0), "two", break, "three", close]
140+
141+
# Maximum line length of 10
142+
Output =
143+
"""
144+
one
145+
two
146+
three
147+
"""
148+
```
149+
150+
The open token of a group is assigned the total size of the group as its length.
151+
Open tokens must always be paired with a *close* token.
152+
153+
```
154+
Tokens = ["one", break(1, 2), open(C, 2), "two", break(1, 2), "three", close]
155+
Lengths = [3, 11, 10, 3, 1, 5, 0]
156+
```
157+
158+
#### Close
159+
160+
The *close* tokens indicate the end of a group, and they have a length of zero.
161+
They must always be paired with an *open* token.
162+
163+
#### Newline
164+
165+
The *newline* tokens behave much the same way as *break* tokens, except that
166+
they always produce a line break. They can be assigned an offset, in the same
167+
way as a break. They can also be given an integer number of line breaks to
168+
produce.
169+
170+
These tokens are given a length equal to the maximum allowed line width. The
171+
reason for this is to indicate that any enclosing groups are too large to fit on
172+
a single line.
173+
174+
```
175+
# Assume maximum line length of 50
176+
# break(size)
177+
Tokens = ["one", break(1), "two", break(1), open, "three", newline, "four", close]
178+
Lengths = [3, 4, 3, 60, 59, 5, 50, 4, 0]
179+
```
180+
181+
#### Reset
182+
183+
Reset tokens are used to reset the state created by break tokens if needed, and
184+
are rarely used. A primary use-case is to prevent an entire group from moving to
185+
a new line, but you still want the group to break internally. Reset tokens have
186+
a length of zero.
187+
188+
A reset token makes whatever follows it behave as if it was at the beginning of
189+
the line.
190+
191+
```
192+
Tokens = ["one", break(1), "two", reset]
193+
Lengths = [3, 4, 3, 0]
194+
195+
# Normal breaking behavior of a consistent group
196+
Tokens = ["one", break(1), open(C, 2), "two", break(1), "three", break(1), "four", close]
197+
Output =
198+
"""
199+
one
200+
two
201+
three
202+
four
203+
"""
204+
205+
# Breaking behavior of a consistent group with a reset token
206+
Tokens = ["one", break(1), reset, open(C, 2), "two", break(1), "three", break(1), "four", close]
207+
Output =
208+
"""
209+
one two
210+
three
211+
four
212+
"""
213+
```
214+
215+
#### Comment
216+
217+
Comment tokens represent Swift source comments, and they come in four types:
218+
`line`, `docLine`, `block`, and `docBlock`. Their length is equal to the number
219+
of characters needed to print them, including whitespace and delimiters. Line
220+
comments produce one comment token per line. If other comment types span
221+
multiple lines, their content is represented as a single comment token.
222+
223+
```
224+
# Line comment
225+
// comment 1
226+
// comment 2
227+
Tokens = [line(" comment 1"), newline, line(" comment 2")]
228+
229+
/// Doc comment 1
230+
/// Second line
231+
Tokens = [docLine(" Doc comment 1\n Second line")]
232+
233+
/* Block comment
234+
Second line */
235+
Tokens = [block(" Block comment\n Second Line ")]
236+
237+
/** Doc Block comment
238+
* Second line **/
239+
Tokens = [docBlock(" Doc Block comment\n * Second line *")]
240+
```
241+
242+
### Token Generation
243+
244+
Token generation begins with the abstract syntax tree (AST) of the Swift source
245+
file, provided by the [SwiftSyntax](https://github.com/apple/swift-syntax)
246+
library. We have a `visit` method for each of the different syntax node types
247+
(e.g. `FunctionDeclSyntax`, `GenericWhereClause`, etc.). Within each of these
248+
visit methods, we can attach pretty-printer `Token` objects before and after
249+
syntax tokens from the AST. For example, if we wanted a group after the opening
250+
brace of a function declaration, it might look like:
251+
252+
```
253+
# node: FunctionDeclSyntax
254+
after(node.body?.leftBrace, tokens: .break(size: 1, offset: 2), .open(.consistent, 0))
255+
```
256+
257+
All of the tokens are placed into an array, which are then passed on to the
258+
*scan* phase of the pretty printer.
259+
260+
See: [`TokenStreamCreator.swift`](../Sources/SwiftFormatPrettyPrint/TokenStreamCreator.swift)

0 commit comments

Comments
 (0)