[Syntax] Fix a few typos and language in Readme.md

bitjammer · bitjammer · commit 52096534a650 · 2017-02-22T18:45:29.000-08:00
diff --git a/lib/Syntax/README.md b/lib/Syntax/README.md
@@ -8,16 +8,16 @@ striving to be safe, correct, and intuitive to use. The
 library emphasizes immutable, thread-safe data structures, full-fidelity
 representation of source, and facilities for *structured editing*.
 
-What is structured editing? It's an editing strategy that is keenly aware
-of the *structure* of source code, not necessarily its *representation* (i.e.
-characters or bytes). This can be achieved at different granularities:
-replacing an identifier, changing a call to global function to a method call, or
-indenting and formatting an entire source file based on declarative rules. These
-kinds of diverse operations are critical to the Swift Migrator, which is the immediate
+What is structured editing? It's an editing strategy that is keenly aware of the
+*structure* of source code, not necessarily its *representation* (i.e.
+characters or bytes). This can be achieved at different granularities: replacing
+an identifier, changing a call to global function to a method call, or indenting
+and formatting an entire source file based on declarative rules. These kinds of
+diverse operations are critical to the Swift Migrator, which is the immediate
 client for this library, now developed in the open. Along with that, the library
 will also provide infrastructure for a first-class `swift-format` tool.
 
-Eventually, the goal of this library is to represent Swift syntax to all of the
+Eventually, the goal of this library is to represent Swift syntax in all of the
 compiler. Currently, lib/AST structures don't make a very clear distinction
 between syntactic and semantic information. Long term, we hope to achieve the
 following based on work here:
@@ -28,8 +28,8 @@ following based on work here:
 - Lower high-water memory use due to reference counting without the need for
   leak-forever memory contexts
 - Incremental re-parsing
-- Incremental, lazier re-type-checking (helped by separating syntactic
-  information)
+- Incremental, lazier re-type-checking, helped by separating syntactic
+  information
 
 This library is a work in progress and should be expected to be in a molten
 state for some time. Don't integrate this into other areas of the compiler or
@@ -68,9 +68,9 @@ points for this library:
    - For each grammar production, as many combinations as possible, especially
      with respect to optional terms and expected by missing terms
 1. All public APIs must have documentation comments.
-1. Represent Swift grammar and use naming conventions in accordance with The Swift
-   Programming Language book as much as possible, so people know what to look
-   for.
+1. Represent Swift grammar and use naming conventions in accordance with The
+   Swift Programming Language book as much as possible, so people know what to
+   look for.
 1. Accomodate "bad syntax" - humans are imperfect and source code is constantly
    in a state of flux in an editor. Unforunately, we still live in a
    character-centric world - the library shouldn't fall over on bad syntax just
@@ -80,8 +80,8 @@ points for this library:
 
 ### Make APIs
 
-*Make APIs* are for creating new syntax nodes in a single call. Although you need
-to provide all of the pieces of syntax to these APIs, you are free to use
+*Make APIs* are for creating new syntax nodes in a single call. Although you
+need to provide all of the pieces of syntax to these APIs, you are free to use
 "missing" placeholders as substructure. Make APIs return freestanding syntax
 nodes and do not establish parental relationships.
 
@@ -90,9 +90,9 @@ nodes and do not establish parental relationships.
 The `SyntaxFactory` embodies the Make APIs and is the one-stop shop for creating
 new syntax nodes and tokens in a single call. There are two main Make APIs
 exposed for each Syntax node: making the node with all of the pieces, or making
-a blank node with all of the pieces marked as *missing*. For example, a
-`StructDeclSyntax` node has a `makeStructDeclSyntax` and
-`makeBlankStructDeclSyntax` on `SyntaxFactory` for those two cases respectively.
+a blank node with all of the pieces marked as *missing*. For example,
+`SyntaxFactory` has `makeStructDeclSyntax` and `makeBlankStructDeclSyntax` that
+both return a `StructDeclSyntax`.
 
 Instead of constructors on each syntax node's class, static creation methods are
 all supplied here in the `SyntaxFactory` for better code completion - you don't
@@ -159,10 +159,13 @@ struct YourStruct {}
 
 ### Builder APIs
 
-*Builder APIs* are provided for building up syntax incrementally as it appears. At
-any point in the building process, you can call `build()` and get a reasonably
-formed Syntax node (i.e. with no raw `nullptr`s) using what you've provided to
-the builder so far. Anything that you haven't supplied is marked as *missing*.
+*Builder APIs* are provided for building up syntax incrementally as it appears.
+At any point in the building process, you can call `build()` and get a
+reasonably formed Syntax node (i.e. with no raw `nullptr`s) using what you've
+provided to the builder so far. Anything that you haven't supplied is marked as
+*missing*. This is essentially what the parser does so, looking forward to
+future adoption, the builders are designed with the parser in mind, with the
+hope that we can better specify recovery behavior and incremental (re-)parsing.
 
 **Example**
 
@@ -206,6 +209,10 @@ struct MyStruct {}
 
 Much better!
 
+Note that syntax builders own and mutate the data they will eventually use to
+build a syntax node. They themselves should not be shared between threads.
+However, anything the builder builds and returns to you is safe and immutable.
+
 ### Syntax Rewriters
 
 `TODO`.
@@ -219,7 +226,7 @@ they store a kind, whether they were missing in the source, and the *layout*,
 which is a list of children and represents the recursive substructure. Although
 these are tree-like in nature, *they maintain no parental relationships* because
 they can be shared among many nodes. Eventually, `RawSyntax` bottoms out in
-tokens, the terminals, which are represented by the `TokenSyntax` class.
+tokens, represented by the `TokenSyntax` class.
 
 #### RawSyntax summary
 
@@ -232,9 +239,9 @@ tokens, the terminals, which are represented by the `TokenSyntax` class.
 ### TokenSyntax
 
 These are special cases of `RawSyntax` and represent all terminals in the
-grammar. Aside from the token kind, these have two very important pieces of
-information for full-fidelity source: leading and trailing source *trivia*
-surrounding the token.
+grammar. Aside from the token kind and the text, they have two very important
+pieces of information for full-fidelity source: leading and trailing source
+*trivia* surrounding the token.
 
 #### TokenSyntax summary
 
@@ -251,7 +258,9 @@ surrounding the token.
 You've already seen some uses of `Trivia` in the examples above. These are
 pieces of syntax that aren't really relevant to the semantics of the program,
 such as whitespace and comments. These are modeled as collections and, with the
-exception of comments, are sort of "run-length" encoded.
+exception of comments, are sort of "run-length" encoded. For example, a sequence
+of four spaces is represented by `{ Kind: TriviaKind::Space, Count: 4 }`, not
+the literal text `"    "`.
 
 Some examples of the "atoms" of `Trivia`:
 
@@ -289,18 +298,18 @@ Breaking this down token by token:
 - `func`
   - Leading trivia: none.
   - Trailing trivia: Takes up the space after (Rule 1).
+
     ```c++
     // Equivalent to:
     Trivia::spaces(1)
     ```
 
 - `foo`
-  - Leading trivia: none. The previous `func` at the space before.
+  - Leading trivia: none. The previous `func` ate the space before.
   - Trailing trivia: none.
-    '('.
+
 - `(`
   - Leading trivia: none.
-    identifier.
   - Trailing trivia: none.
 
 - `)`
@@ -314,6 +323,7 @@ Breaking this down token by token:
 
 - `var`
   - Leading trivia: One newline followed by two spaces because of Rule 2.
+
     ```c++
     // Equivalent to:
     Trivia::newlines(1) + Trivia::spaces(2)
@@ -325,11 +335,11 @@ Breaking this down token by token:
   - Trailing trivia: Takes up the space after (Rule 1).
 
 - `=`
-  - Leading trivia: none. The previous `x` at the space before.
+  - Leading trivia: none. The previous `x` ate the space before.
   - Trailing trivia: Takes up the space after (Rule 1).
 
 - `2`
-  - Leading trivia: none. The previous `=` at the space before.
+  - Leading trivia: none. The previous `=` ate the space before.
   - Trailing trivia: none: Because of Rule 1, it doesn't take the following
     newline.
 
@@ -341,6 +351,12 @@ Breaking this down token by token:
   - Leading trivia: none.
   - Trailing trivia: none.
 
+A couple of remarks about the `EOF` token:
+
+- Starting with the first newline after the last non-EOF token, `EOF` takes
+  all remaining trivia in the source file as its leading trivia.
+- Because of this, `EOF` never has trailing trivia.
+
 #### Summary of Trivia
 
 - `Trivia` represent *source trivia*, the whitespace and comments in a Swift
@@ -351,24 +367,27 @@ Breaking this down token by token:
 ### SyntaxData
 
 `SyntaxData` nodes wrap `RawSyntax` nodes with a few important pieces of
-information: a pointer to a parent, the position in which the node occurs in its
-parent, and cached children. For example, if we have a `StructDeclSyntaxData`,
-wrapping a `RawSyntax` for a struct declaration, we might ask for the generic
-parameter clause. At first, this is only represented in the raw syntax. On first
-ask, we thaw those out by creating a new `GenericParameterClauseSyntaxData`,
-cache it as our child, set its parent to `this`, and send it back to the caller.
+additional information: a pointer to a parent, the position in which the node
+occurs in its parent, and cached children.
+
+For example, if we have a `StructDeclSyntaxData`, wrapping a `RawSyntax` for a
+struct declaration, we might ask for the generic parameter clause. At first,
+this is only represented in the raw syntax. On first ask, we thaw those out by
+creating a new `GenericParameterClauseSyntaxData`, cache it as our child, set
+its parent to `this`, and send it back to the caller. These cached children
+are strong references, keeping the syntax tree alive in memory.
 
 You can think of `SyntaxData` as "concrete" or "realized" syntax nodes. They
 represent a specific piece of source code, have an absolute location, line and
-column number, etc. `RawSyntax` are more like the integer 1 - existing in theory
-everywhere it occurs.
+column number, etc. `RawSyntax` are more like the integer 1 - a single
+theoretical entity that exists, but manifesting everywhere it occurs identically
+in Swift source code.
 
 Beyond this, `SyntaxData` nodes have *no signficant public API*.
 
 - `SyntaxData` are immutable.
    However, they may mutate themselves in order to implement lazy instantiation
-   of children and caching. This should be transparent and safe to any internal
-   implementation.
+   of children and caching. That caching operation transparent and thread-safe.
 - `SyntaxData` have identity, i.e. they can be compared with "pointer equality".
 - `SyntaxData` are implementation detail have no public API.
 
@@ -383,14 +402,14 @@ public interface: the *With APIs*, getters, etc. Anyone working with the
 Internally, they are actually packaged as a strong reference to the root of the
 tree in which that node resides, and a weak reference to the `SyntaxData`
 representing that node. Why a weak reference to the data? We do this to prevent
-retain cycles: all strong references point down in the tree, starting at the
-root.
+retain cycles and minimize retain/release traffic: **all strong references point
+down in the tree, starting at the root**.
 
-Although it's important for the entire library to be easy to use and maintain,
-it's especially important that the APIs in `Syntax` nodes remain intuitive and
-do what you expect with no weird side effects, necessary contexts to maintain,
-etc. If you have a handle on a `Syntax` node, you're safe to query anything
-about it without other processes pulling out the rug from under you.
+Although it's important for the entire library to be easy to use and maintain in
+general, it's especially important that the APIs in `Syntax` nodes remain
+intuitive and do what you expect with no weird side effects, necessary contexts
+to maintain, etc. If you have a handle on a `Syntax` node, you're safe to query
+anything about it without other processes pulling out the rug from under you.
 
 ### Example Object Diagram: `{ return 1 }`
 
@@ -431,7 +450,8 @@ auto Block = SyntaxFactory::makeBlankCodeBlockStmt()
 auto MyReturn = Block.getStatement(0).castTo<ReturnStmt>;
 ```
 
-And here's what the object diagram would look like starting with `MyReturn`.
+Here's what the corresponding object diagram would look like starting with
+`MyReturn`.
 
 ![Syntax Example](.doc/SyntaxExample.png)
 
@@ -459,17 +479,22 @@ Here's a handy checklist when implementing a production in the grammar.
 - Check that the corresponding `lib/AST` node has `SourceLocs` for all terms. If
   it doesn't, [file a Swift bug][NewSwiftBug] and fix that first.
   - **Add the `Syntax` bug label!**
-- Check if it's not already being worked on, and then [file a Swift bug][NewSwiftBug], noting which grammar productions are affected.
+- Check if it's not already being worked on, and then
+  [file a Swift bug][NewSwiftBug], noting which grammar productions
+  are affected.
   - **Add the `Syntax` bug label!**
 - Add a *kind* to include/swift/Syntax/SyntaxKinds.def
 - Create the `${KIND}SyntaxData` class.  
   - Cached children members as `RC<${CHILDKIND}SyntaxData>`
 - Create the `${KIND}Syntax` class.  
   Be sure to implement the following:
-  - Define the `Cursor` enum for the syntax node. This specifies all of the terms of the production, including optional terms. For example, a same-type generic requirement is:  
+  - Define the `Cursor` enum for the syntax node. This specifies all of the
+    terms of the production, including optional terms. For example, a same-type
+    generic requirement is:  
     `same-type-requirement -> type-identifier '==' type`
 
-    That's three terms in the production, and you can see this reflected in the `StructDeclSyntaxData` class:
+    That's three terms in the production, and you can see this reflected in the
+    `StructDeclSyntaxData` class:
 
     ```c++
     enum Cursor : CursorIndex {
@@ -499,7 +524,8 @@ Here's a handy checklist when implementing a production in the grammar.
   - `makeBlank${KIND}Syntax()`
     - Add a C++ unit test.
 - If applicable, create a `${KIND}SyntaxBuilder`.
-  - `use____(...)` methods for each layout element - takes a `${KIND}Syntax` for that child type.
+  - `use____(...)` methods for each layout element - takes a `${KIND}Syntax` for
+    that child type.
   - `${KIND}Syntax build() const`
     - Add a C++ unit test.
       - `build()` at all stages of building, followed by `print()`.