Skip to content

Commit 5209653

Browse files
committed
[Syntax] Fix a few typos and language in Readme.md
1 parent ab0e817 commit 5209653

File tree

1 file changed

+79
-53
lines changed

1 file changed

+79
-53
lines changed

lib/Syntax/README.md

+79-53
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,16 @@ striving to be safe, correct, and intuitive to use. The
88
library emphasizes immutable, thread-safe data structures, full-fidelity
99
representation of source, and facilities for *structured editing*.
1010

11-
What is structured editing? It's an editing strategy that is keenly aware
12-
of the *structure* of source code, not necessarily its *representation* (i.e.
13-
characters or bytes). This can be achieved at different granularities:
14-
replacing an identifier, changing a call to global function to a method call, or
15-
indenting and formatting an entire source file based on declarative rules. These
16-
kinds of diverse operations are critical to the Swift Migrator, which is the immediate
11+
What is structured editing? It's an editing strategy that is keenly aware of the
12+
*structure* of source code, not necessarily its *representation* (i.e.
13+
characters or bytes). This can be achieved at different granularities: replacing
14+
an identifier, changing a call to global function to a method call, or indenting
15+
and formatting an entire source file based on declarative rules. These kinds of
16+
diverse operations are critical to the Swift Migrator, which is the immediate
1717
client for this library, now developed in the open. Along with that, the library
1818
will also provide infrastructure for a first-class `swift-format` tool.
1919

20-
Eventually, the goal of this library is to represent Swift syntax to all of the
20+
Eventually, the goal of this library is to represent Swift syntax in all of the
2121
compiler. Currently, lib/AST structures don't make a very clear distinction
2222
between syntactic and semantic information. Long term, we hope to achieve the
2323
following based on work here:
@@ -28,8 +28,8 @@ following based on work here:
2828
- Lower high-water memory use due to reference counting without the need for
2929
leak-forever memory contexts
3030
- Incremental re-parsing
31-
- Incremental, lazier re-type-checking (helped by separating syntactic
32-
information)
31+
- Incremental, lazier re-type-checking, helped by separating syntactic
32+
information
3333

3434
This library is a work in progress and should be expected to be in a molten
3535
state for some time. Don't integrate this into other areas of the compiler or
@@ -68,9 +68,9 @@ points for this library:
6868
- For each grammar production, as many combinations as possible, especially
6969
with respect to optional terms and expected by missing terms
7070
1. All public APIs must have documentation comments.
71-
1. Represent Swift grammar and use naming conventions in accordance with The Swift
72-
Programming Language book as much as possible, so people know what to look
73-
for.
71+
1. Represent Swift grammar and use naming conventions in accordance with The
72+
Swift Programming Language book as much as possible, so people know what to
73+
look for.
7474
1. Accomodate "bad syntax" - humans are imperfect and source code is constantly
7575
in a state of flux in an editor. Unforunately, we still live in a
7676
character-centric world - the library shouldn't fall over on bad syntax just
@@ -80,8 +80,8 @@ points for this library:
8080

8181
### Make APIs
8282

83-
*Make APIs* are for creating new syntax nodes in a single call. Although you need
84-
to provide all of the pieces of syntax to these APIs, you are free to use
83+
*Make APIs* are for creating new syntax nodes in a single call. Although you
84+
need to provide all of the pieces of syntax to these APIs, you are free to use
8585
"missing" placeholders as substructure. Make APIs return freestanding syntax
8686
nodes and do not establish parental relationships.
8787

@@ -90,9 +90,9 @@ nodes and do not establish parental relationships.
9090
The `SyntaxFactory` embodies the Make APIs and is the one-stop shop for creating
9191
new syntax nodes and tokens in a single call. There are two main Make APIs
9292
exposed for each Syntax node: making the node with all of the pieces, or making
93-
a blank node with all of the pieces marked as *missing*. For example, a
94-
`StructDeclSyntax` node has a `makeStructDeclSyntax` and
95-
`makeBlankStructDeclSyntax` on `SyntaxFactory` for those two cases respectively.
93+
a blank node with all of the pieces marked as *missing*. For example,
94+
`SyntaxFactory` has `makeStructDeclSyntax` and `makeBlankStructDeclSyntax` that
95+
both return a `StructDeclSyntax`.
9696

9797
Instead of constructors on each syntax node's class, static creation methods are
9898
all supplied here in the `SyntaxFactory` for better code completion - you don't
@@ -159,10 +159,13 @@ struct YourStruct {}
159159

160160
### Builder APIs
161161

162-
*Builder APIs* are provided for building up syntax incrementally as it appears. At
163-
any point in the building process, you can call `build()` and get a reasonably
164-
formed Syntax node (i.e. with no raw `nullptr`s) using what you've provided to
165-
the builder so far. Anything that you haven't supplied is marked as *missing*.
162+
*Builder APIs* are provided for building up syntax incrementally as it appears.
163+
At any point in the building process, you can call `build()` and get a
164+
reasonably formed Syntax node (i.e. with no raw `nullptr`s) using what you've
165+
provided to the builder so far. Anything that you haven't supplied is marked as
166+
*missing*. This is essentially what the parser does so, looking forward to
167+
future adoption, the builders are designed with the parser in mind, with the
168+
hope that we can better specify recovery behavior and incremental (re-)parsing.
166169

167170
**Example**
168171

@@ -206,6 +209,10 @@ struct MyStruct {}
206209

207210
Much better!
208211

212+
Note that syntax builders own and mutate the data they will eventually use to
213+
build a syntax node. They themselves should not be shared between threads.
214+
However, anything the builder builds and returns to you is safe and immutable.
215+
209216
### Syntax Rewriters
210217

211218
`TODO`.
@@ -219,7 +226,7 @@ they store a kind, whether they were missing in the source, and the *layout*,
219226
which is a list of children and represents the recursive substructure. Although
220227
these are tree-like in nature, *they maintain no parental relationships* because
221228
they can be shared among many nodes. Eventually, `RawSyntax` bottoms out in
222-
tokens, the terminals, which are represented by the `TokenSyntax` class.
229+
tokens, represented by the `TokenSyntax` class.
223230

224231
#### RawSyntax summary
225232

@@ -232,9 +239,9 @@ tokens, the terminals, which are represented by the `TokenSyntax` class.
232239
### TokenSyntax
233240

234241
These are special cases of `RawSyntax` and represent all terminals in the
235-
grammar. Aside from the token kind, these have two very important pieces of
236-
information for full-fidelity source: leading and trailing source *trivia*
237-
surrounding the token.
242+
grammar. Aside from the token kind and the text, they have two very important
243+
pieces of information for full-fidelity source: leading and trailing source
244+
*trivia* surrounding the token.
238245

239246
#### TokenSyntax summary
240247

@@ -251,7 +258,9 @@ surrounding the token.
251258
You've already seen some uses of `Trivia` in the examples above. These are
252259
pieces of syntax that aren't really relevant to the semantics of the program,
253260
such as whitespace and comments. These are modeled as collections and, with the
254-
exception of comments, are sort of "run-length" encoded.
261+
exception of comments, are sort of "run-length" encoded. For example, a sequence
262+
of four spaces is represented by `{ Kind: TriviaKind::Space, Count: 4 }`, not
263+
the literal text `" "`.
255264

256265
Some examples of the "atoms" of `Trivia`:
257266

@@ -289,18 +298,18 @@ Breaking this down token by token:
289298
- `func`
290299
- Leading trivia: none.
291300
- Trailing trivia: Takes up the space after (Rule 1).
301+
292302
```c++
293303
// Equivalent to:
294304
Trivia::spaces(1)
295305
```
296306
297307
- `foo`
298-
- Leading trivia: none. The previous `func` at the space before.
308+
- Leading trivia: none. The previous `func` ate the space before.
299309
- Trailing trivia: none.
300-
'('.
310+
301311
- `(`
302312
- Leading trivia: none.
303-
identifier.
304313
- Trailing trivia: none.
305314
306315
- `)`
@@ -314,6 +323,7 @@ Breaking this down token by token:
314323
315324
- `var`
316325
- Leading trivia: One newline followed by two spaces because of Rule 2.
326+
317327
```c++
318328
// Equivalent to:
319329
Trivia::newlines(1) + Trivia::spaces(2)
@@ -325,11 +335,11 @@ Breaking this down token by token:
325335
- Trailing trivia: Takes up the space after (Rule 1).
326336
327337
- `=`
328-
- Leading trivia: none. The previous `x` at the space before.
338+
- Leading trivia: none. The previous `x` ate the space before.
329339
- Trailing trivia: Takes up the space after (Rule 1).
330340
331341
- `2`
332-
- Leading trivia: none. The previous `=` at the space before.
342+
- Leading trivia: none. The previous `=` ate the space before.
333343
- Trailing trivia: none: Because of Rule 1, it doesn't take the following
334344
newline.
335345
@@ -341,6 +351,12 @@ Breaking this down token by token:
341351
- Leading trivia: none.
342352
- Trailing trivia: none.
343353
354+
A couple of remarks about the `EOF` token:
355+
356+
- Starting with the first newline after the last non-EOF token, `EOF` takes
357+
all remaining trivia in the source file as its leading trivia.
358+
- Because of this, `EOF` never has trailing trivia.
359+
344360
#### Summary of Trivia
345361
346362
- `Trivia` represent *source trivia*, the whitespace and comments in a Swift
@@ -351,24 +367,27 @@ Breaking this down token by token:
351367
### SyntaxData
352368
353369
`SyntaxData` nodes wrap `RawSyntax` nodes with a few important pieces of
354-
information: a pointer to a parent, the position in which the node occurs in its
355-
parent, and cached children. For example, if we have a `StructDeclSyntaxData`,
356-
wrapping a `RawSyntax` for a struct declaration, we might ask for the generic
357-
parameter clause. At first, this is only represented in the raw syntax. On first
358-
ask, we thaw those out by creating a new `GenericParameterClauseSyntaxData`,
359-
cache it as our child, set its parent to `this`, and send it back to the caller.
370+
additional information: a pointer to a parent, the position in which the node
371+
occurs in its parent, and cached children.
372+
373+
For example, if we have a `StructDeclSyntaxData`, wrapping a `RawSyntax` for a
374+
struct declaration, we might ask for the generic parameter clause. At first,
375+
this is only represented in the raw syntax. On first ask, we thaw those out by
376+
creating a new `GenericParameterClauseSyntaxData`, cache it as our child, set
377+
its parent to `this`, and send it back to the caller. These cached children
378+
are strong references, keeping the syntax tree alive in memory.
360379
361380
You can think of `SyntaxData` as "concrete" or "realized" syntax nodes. They
362381
represent a specific piece of source code, have an absolute location, line and
363-
column number, etc. `RawSyntax` are more like the integer 1 - existing in theory
364-
everywhere it occurs.
382+
column number, etc. `RawSyntax` are more like the integer 1 - a single
383+
theoretical entity that exists, but manifesting everywhere it occurs identically
384+
in Swift source code.
365385
366386
Beyond this, `SyntaxData` nodes have *no signficant public API*.
367387
368388
- `SyntaxData` are immutable.
369389
However, they may mutate themselves in order to implement lazy instantiation
370-
of children and caching. This should be transparent and safe to any internal
371-
implementation.
390+
of children and caching. That caching operation transparent and thread-safe.
372391
- `SyntaxData` have identity, i.e. they can be compared with "pointer equality".
373392
- `SyntaxData` are implementation detail have no public API.
374393
@@ -383,14 +402,14 @@ public interface: the *With APIs*, getters, etc. Anyone working with the
383402
Internally, they are actually packaged as a strong reference to the root of the
384403
tree in which that node resides, and a weak reference to the `SyntaxData`
385404
representing that node. Why a weak reference to the data? We do this to prevent
386-
retain cycles: all strong references point down in the tree, starting at the
387-
root.
405+
retain cycles and minimize retain/release traffic: **all strong references point
406+
down in the tree, starting at the root**.
388407
389-
Although it's important for the entire library to be easy to use and maintain,
390-
it's especially important that the APIs in `Syntax` nodes remain intuitive and
391-
do what you expect with no weird side effects, necessary contexts to maintain,
392-
etc. If you have a handle on a `Syntax` node, you're safe to query anything
393-
about it without other processes pulling out the rug from under you.
408+
Although it's important for the entire library to be easy to use and maintain in
409+
general, it's especially important that the APIs in `Syntax` nodes remain
410+
intuitive and do what you expect with no weird side effects, necessary contexts
411+
to maintain, etc. If you have a handle on a `Syntax` node, you're safe to query
412+
anything about it without other processes pulling out the rug from under you.
394413
395414
### Example Object Diagram: `{ return 1 }`
396415
@@ -431,7 +450,8 @@ auto Block = SyntaxFactory::makeBlankCodeBlockStmt()
431450
auto MyReturn = Block.getStatement(0).castTo<ReturnStmt>;
432451
```
433452

434-
And here's what the object diagram would look like starting with `MyReturn`.
453+
Here's what the corresponding object diagram would look like starting with
454+
`MyReturn`.
435455

436456
![Syntax Example](.doc/SyntaxExample.png)
437457

@@ -459,17 +479,22 @@ Here's a handy checklist when implementing a production in the grammar.
459479
- Check that the corresponding `lib/AST` node has `SourceLocs` for all terms. If
460480
it doesn't, [file a Swift bug][NewSwiftBug] and fix that first.
461481
- **Add the `Syntax` bug label!**
462-
- Check if it's not already being worked on, and then [file a Swift bug][NewSwiftBug], noting which grammar productions are affected.
482+
- Check if it's not already being worked on, and then
483+
[file a Swift bug][NewSwiftBug], noting which grammar productions
484+
are affected.
463485
- **Add the `Syntax` bug label!**
464486
- Add a *kind* to include/swift/Syntax/SyntaxKinds.def
465487
- Create the `${KIND}SyntaxData` class.
466488
- Cached children members as `RC<${CHILDKIND}SyntaxData>`
467489
- Create the `${KIND}Syntax` class.
468490
Be sure to implement the following:
469-
- Define the `Cursor` enum for the syntax node. This specifies all of the terms of the production, including optional terms. For example, a same-type generic requirement is:
491+
- Define the `Cursor` enum for the syntax node. This specifies all of the
492+
terms of the production, including optional terms. For example, a same-type
493+
generic requirement is:
470494
`same-type-requirement -> type-identifier '==' type`
471495

472-
That's three terms in the production, and you can see this reflected in the `StructDeclSyntaxData` class:
496+
That's three terms in the production, and you can see this reflected in the
497+
`StructDeclSyntaxData` class:
473498

474499
```c++
475500
enum Cursor : CursorIndex {
@@ -499,7 +524,8 @@ Here's a handy checklist when implementing a production in the grammar.
499524
- `makeBlank${KIND}Syntax()`
500525
- Add a C++ unit test.
501526
- If applicable, create a `${KIND}SyntaxBuilder`.
502-
- `use____(...)` methods for each layout element - takes a `${KIND}Syntax` for that child type.
527+
- `use____(...)` methods for each layout element - takes a `${KIND}Syntax` for
528+
that child type.
503529
- `${KIND}Syntax build() const`
504530
- Add a C++ unit test.
505531
- `build()` at all stages of building, followed by `print()`.

0 commit comments

Comments
 (0)