|
| 1 | +# SwiftFormat Pretty Printer |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +The algorithm used in the SwiftFormat pretty printer is based on the "simple" |
| 6 | +version of the algorithm described by Derek Oppen in his paper [*Pretty |
| 7 | +Printing*](http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf) |
| 8 | +(1979). It employs two functions: *scan* and *print*. The *scan* function |
| 9 | +accepts a stream of tokens and calculates the lengths of these tokens. It then |
| 10 | +passes the tokens and their computed lengths to *print*, which handles the |
| 11 | +actual printing of the tokens, automatically inserting line breaks and indents |
| 12 | +to obey a given maximum line length. We describe in detail how these functions |
| 13 | +have been implemented in SwiftFormat. |
| 14 | + |
| 15 | +## Tokens |
| 16 | + |
| 17 | +### Token Groups |
| 18 | + |
| 19 | +It is often necessary to group a series of tokens together into logical groups |
| 20 | +that we want to avoid splitting with line break if possible. The algorithm tries |
| 21 | +to break as few groups as possible when printing. Groups begin with *open* |
| 22 | +tokens and end with *close* tokens. These tokens must always be paired. |
| 23 | + |
| 24 | +### Token Types |
| 25 | + |
| 26 | +The different types of tokens are represented as a Token `enum` within the code. |
| 27 | +The available cases are: `syntax`, `break`, `open`, `close`, `newlines`, |
| 28 | +`comment`, and `reset`. The behavior of each of them is described below with |
| 29 | +pseudocode examples. |
| 30 | + |
| 31 | +See: [`Token.swift`](../Sources/SwiftFormatPrettyPrint/Token.swift) |
| 32 | + |
| 33 | +#### Syntax |
| 34 | + |
| 35 | +The *syntax* tokens contain the segments of text that need to be printed (e.g. |
| 36 | +`}`, `func`, `23`, `while`, etc.). The length of a token is the number of |
| 37 | +columns needed to print it. For example, `func` would have a length of 4. |
| 38 | + |
| 39 | +#### Break |
| 40 | + |
| 41 | +The *break* tokens indicate where line breaks are allowed to occur. These |
| 42 | +frequently occur as the whitespace in between syntax tokens. The breaks contain |
| 43 | +two associated values that can be specified when creating the break token: |
| 44 | +*size* and *offset*. The size indicates how many columns of whitespace should |
| 45 | +be printed when the token is encountered. If a line break should occur at the |
| 46 | +break token, the offset indicates how many spaces should be used for indentation |
| 47 | +of the next token. The length of a break is its size plus the length of the |
| 48 | +token that immediately come after it. If a break immediately precedes a group, |
| 49 | +its length will be its size plus the size of the group. |
| 50 | + |
| 51 | +``` |
| 52 | +# break(size, offset) |
| 53 | +Tokens = ["one", break(1, 2), "two", break(1, 2), "three"] |
| 54 | +Lengths = [3, 4, 3, 6, 5] |
| 55 | +
|
| 56 | +# Maximum line length of 10 |
| 57 | +Output = |
| 58 | +""" |
| 59 | +one two |
| 60 | + three |
| 61 | +""" |
| 62 | +``` |
| 63 | + |
| 64 | +#### Open |
| 65 | + |
| 66 | +An *open* token indicates the start of a group. |
| 67 | + |
| 68 | +``` |
| 69 | +# break(size=1, offset=0) |
| 70 | +Token = ["one", break, open, "two", break, "three", break, open, "four", break, "five", close, close] |
| 71 | +
|
| 72 | +# Maximum line length of 20 |
| 73 | +Output = |
| 74 | +""" |
| 75 | +one |
| 76 | +two three four five |
| 77 | +""" |
| 78 | +
|
| 79 | +# Maximum line length of 10 |
| 80 | +Output = |
| 81 | +""" |
| 82 | +one |
| 83 | +two three |
| 84 | +four five |
| 85 | +""" |
| 86 | +``` |
| 87 | + |
| 88 | +Open tokens have a *break style* and an *offset*. The break style is either |
| 89 | +*consistent* or *inconsistent*. If a group is too large to fit on the remaining |
| 90 | +space on a line, and it is labeled as *consistent*, then the break tokens it |
| 91 | +contains will all produce line breaks. (In the case of nested groups, the break |
| 92 | +style affects a group's immediate children.) The default behavior is |
| 93 | +*inconsistent*, in which case the break tokens only produce line breaks when |
| 94 | +their lengths exceed the remaining space on the line. |
| 95 | + |
| 96 | +``` |
| 97 | +# open(consistent/inconsistent), break(size, offset) |
| 98 | +Tokens = ["one", break(1, 0), open(C), "two", break(1, 0), "three", close] |
| 99 | +
|
| 100 | +# Maximum line length of 10 (consistent breaking) |
| 101 | +Output = |
| 102 | +""" |
| 103 | +one |
| 104 | +two |
| 105 | +three |
| 106 | +""" |
| 107 | +
|
| 108 | +# With inconsistent breaking |
| 109 | +Tokens = ["one", break(1, 0), open(I), "two", break(1, 0), "three", close] |
| 110 | +Output = |
| 111 | +""" |
| 112 | +one |
| 113 | +two three |
| 114 | +""" |
| 115 | +``` |
| 116 | + |
| 117 | +The open token's offset applies an offset to the breaks contained within the |
| 118 | +group. A break token's offset value is added to the offset of its group. In the |
| 119 | +case of nested groups, the group offsets add together. If an outer group has an |
| 120 | +offset of 2, and an inner group an offset 3, any break tokens that produce line |
| 121 | +breaks in the inner group will offset by 5 spaces (plus the break's offsets). |
| 122 | +Additionally, a break that produces a line break immediately before an open |
| 123 | +token will also increase the offset. For example, if a break has an offset of 2 |
| 124 | +immediately before an open with an offset of 3, the breaks within the group will |
| 125 | +be offset by 5. |
| 126 | + |
| 127 | +``` |
| 128 | +# open(consistent/inconsistent, offset) |
| 129 | +Tokens = ["one", break, open(C, 2), "two", break, "three", close] |
| 130 | +
|
| 131 | +# Maximum line length of 10 |
| 132 | +Output = |
| 133 | +""" |
| 134 | +one |
| 135 | +two |
| 136 | + three |
| 137 | +""" |
| 138 | +
|
| 139 | +Tokens = ["one", break(offset=2), open(C, 0), "two", break, "three", close] |
| 140 | +
|
| 141 | +# Maximum line length of 10 |
| 142 | +Output = |
| 143 | +""" |
| 144 | +one |
| 145 | + two |
| 146 | + three |
| 147 | +""" |
| 148 | +``` |
| 149 | + |
| 150 | +The open token of a group is assigned the total size of the group as its length. |
| 151 | +Open tokens must always be paired with a *close* token. |
| 152 | + |
| 153 | +``` |
| 154 | +Tokens = ["one", break(1, 2), open(C, 2), "two", break(1, 2), "three", close] |
| 155 | +Lengths = [3, 11, 10, 3, 1, 5, 0] |
| 156 | +``` |
| 157 | + |
| 158 | +#### Close |
| 159 | + |
| 160 | +The *close* tokens indicate the end of a group, and they have a length of zero. |
| 161 | +They must always be paired with an *open* token. |
| 162 | + |
| 163 | +#### Newline |
| 164 | + |
| 165 | +The *newline* tokens behave much the same way as *break* tokens, except that |
| 166 | +they always produce a line break. They can be assigned an offset, in the same |
| 167 | +way as a break. They can also be given an integer number of line breaks to |
| 168 | +produce. |
| 169 | + |
| 170 | +These tokens are given a length equal to the maximum allowed line width. The |
| 171 | +reason for this is to indicate that any enclosing groups are too large to fit on |
| 172 | +a single line. |
| 173 | + |
| 174 | +``` |
| 175 | +# Assume maximum line length of 50 |
| 176 | +# break(size) |
| 177 | +Tokens = ["one", break(1), "two", break(1), open, "three", newline, "four", close] |
| 178 | +Lengths = [3, 4, 3, 60, 59, 5, 50, 4, 0] |
| 179 | +``` |
| 180 | + |
| 181 | +#### Reset |
| 182 | + |
| 183 | +Reset tokens are used to reset the state created by break tokens if needed, and |
| 184 | +are rarely used. A primary use-case is to prevent an entire group from moving to |
| 185 | +a new line, but you still want the group to break internally. Reset tokens have |
| 186 | +a length of zero. |
| 187 | + |
| 188 | +A reset token makes whatever follows it behave as if it was at the beginning of |
| 189 | +the line. |
| 190 | + |
| 191 | +``` |
| 192 | +Tokens = ["one", break(1), "two", reset] |
| 193 | +Lengths = [3, 4, 3, 0] |
| 194 | +
|
| 195 | +# Normal breaking behavior of a consistent group |
| 196 | +Tokens = ["one", break(1), open(C, 2), "two", break(1), "three", break(1), "four", close] |
| 197 | +Output = |
| 198 | +""" |
| 199 | +one |
| 200 | + two |
| 201 | + three |
| 202 | + four |
| 203 | +""" |
| 204 | +
|
| 205 | +# Breaking behavior of a consistent group with a reset token |
| 206 | +Tokens = ["one", break(1), reset, open(C, 2), "two", break(1), "three", break(1), "four", close] |
| 207 | +Output = |
| 208 | +""" |
| 209 | +one two |
| 210 | + three |
| 211 | + four |
| 212 | +""" |
| 213 | +``` |
| 214 | + |
| 215 | +#### Comment |
| 216 | + |
| 217 | +Comment tokens represent Swift source comments, and they come in four types: |
| 218 | +`line`, `docLine`, `block`, and `docBlock`. Their length is equal to the number |
| 219 | +of characters needed to print them, including whitespace and delimiters. Line |
| 220 | +comments produce one comment token per line. If other comment types span |
| 221 | +multiple lines, their content is represented as a single comment token. |
| 222 | + |
| 223 | +``` |
| 224 | +# Line comment |
| 225 | +// comment 1 |
| 226 | +// comment 2 |
| 227 | +Tokens = [line(" comment 1"), newline, line(" comment 2")] |
| 228 | +
|
| 229 | +/// Doc comment 1 |
| 230 | +/// Second line |
| 231 | +Tokens = [docLine(" Doc comment 1\n Second line")] |
| 232 | +
|
| 233 | +/* Block comment |
| 234 | + Second line */ |
| 235 | +Tokens = [block(" Block comment\n Second Line ")] |
| 236 | +
|
| 237 | +/** Doc Block comment |
| 238 | + * Second line **/ |
| 239 | +Tokens = [docBlock(" Doc Block comment\n * Second line *")] |
| 240 | +``` |
| 241 | + |
| 242 | +### Token Generation |
| 243 | + |
| 244 | +Token generation begins with the abstract syntax tree (AST) of the Swift source |
| 245 | +file, provided by the [SwiftSyntax](https://github.com/apple/swift-syntax) |
| 246 | +library. We have a `visit` method for each of the different syntax node types |
| 247 | +(e.g. `FunctionDeclSyntax`, `GenericWhereClause`, etc.). Within each of these |
| 248 | +visit methods, we can attach pretty-printer `Token` objects before and after |
| 249 | +syntax tokens from the AST. For example, if we wanted a group after the opening |
| 250 | +brace of a function declaration, it might look like: |
| 251 | + |
| 252 | +``` |
| 253 | +# node: FunctionDeclSyntax |
| 254 | +after(node.body?.leftBrace, tokens: .break(size: 1, offset: 2), .open(.consistent, 0)) |
| 255 | +``` |
| 256 | + |
| 257 | +All of the tokens are placed into an array, which are then passed on to the |
| 258 | +*scan* phase of the pretty printer. |
| 259 | + |
| 260 | +See: [`TokenStreamCreator.swift`](../Sources/SwiftFormatPrettyPrint/TokenStreamCreator.swift) |
0 commit comments