Merge pull request rust-lang#28 from nikomatsakis/master

nikomatsakis · web-flow · commit 688d1b098d6a · 2018-01-29T10:27:18.000-05:00
add query + incremental section and restructure a bit
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -5,16 +5,19 @@
 - [Using the compiler testing framework](./running-tests.md)
 - [Walkthrough: a typical contribution](./walkthrough.md)
 - [High-level overview of the compiler source](./high-level-overview.md)
+- [Queries: demand-driven compilation](./query.md)
+    - [Incremental compilation](./incremental-compilation.md)
 - [The parser](./the-parser.md)
 - [Macro expansion](./macro-expansion.md)
 - [Name resolution](./name-resolution.md)
-- [HIR lowering](./hir-lowering.md)
+- [The HIR (High-level IR)](./hir.md)
 - [The `ty` module: representing types](./ty.md)
 - [Type inference](./type-inference.md)
 - [Trait resolution](./trait-resolution.md)
 - [Type checking](./type-checking.md)
-- [MIR construction](./mir-construction.md)
-- [MIR borrowck](./mir-borrowck.md)
-- [MIR optimizations](./mir-optimizations.md)
+- [The MIR (Mid-level IR)](./mir.md)
+    - [MIR construction](./mir-construction.md)
+    - [MIR borrowck](./mir-borrowck.md)
+    - [MIR optimizations](./mir-optimizations.md)
 - [trans: generating LLVM IR](./trans.md)
 - [Glossary](./glossary.md)
diff --git a/src/glossary.md b/src/glossary.md
@@ -9,23 +9,24 @@ AST                     |  the abstract syntax tree produced by the syntax crate
 codegen unit            |  when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
 cx                      |  we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
 DefId                   |  an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
-HIR                     |  the High-level IR, created by lowering and desugaring the AST. See `librustc/hir`.
+HIR                     |  the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
 HirId                   |  identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
-'gcx                    |  the lifetime of the global arena (see `librustc/ty`).
+'gcx                    |  the lifetime of the global arena ([see more](ty.html))
 generics                |  the set of generic type parameters defined on a type or item
 ICE                     |  internal compiler error. When the compiler crashes.
 infcx                   |  the inference context (see `librustc/infer`)
-MIR                     |  the Mid-level IR that is created after type-checking for use by borrowck and trans. Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is found in `src/librustc_mir`.
-obligation              |  something that must be proven by the trait system; see `librustc/traits`.
+MIR                     |  the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
+obligation              |  something that must be proven by the trait system ([see more](trait-resolution.html))
 local crate             |  the crate currently being compiled.
 node-id or NodeId       |  an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
-query                   |  perhaps some sub-computation during compilation; see `librustc/maps`.
-provider                |  the function that executes a query; see `librustc/maps`.
+query                   |  perhaps some sub-computation during compilation ([see more](query.html))
+provider                |  the function that executes a query ([see more](query.html))
 sess                    |  the compiler session, which stores global data used throughout compilation
 side tables             |  because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
 span                    |  a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
 substs                  |  the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
-tcx                     |  the "typing context", main data structure of the compiler (see `librustc/ty`).
+tcx                     |  the "typing context", main data structure of the compiler ([see more](ty.html))
+'tcx                    |  the lifetime of the currently active inference context ([see more](ty.html))
 trans                   |  the code to translate MIR into LLVM IR.
-trait reference         |  a trait and values for its type parameters (see `librustc/ty`).
-ty                      |  the internal representation of a type (see `librustc/ty`).
+trait reference         |  a trait and values for its type parameters ([see more](ty.html)).
+ty                      |  the internal representation of a type ([see more](ty.html)).
diff --git a/src/hir.md b/src/hir.md
@@ -1,4 +1,4 @@
-# HIR lowering
+# The HIR
 
 The HIR -- "High-level IR" -- is the primary IR used in most of
 rustc. It is a desugared version of the "abstract syntax tree" (AST)
@@ -116,4 +116,4 @@ associated with an **owner**, which is typically some kind of item
 (e.g., a `fn()` or `const`), but could also be a closure expression
 (e.g., `|x, y| x + y`). You can use the HIR map to find the body
 associated with a given def-id (`maybe_body_owned_by()`) or to find
-the owner of a body (`body_owner_def_id()`).
+the owner of a body (`body_owner_def_id()`).
diff --git a/src/incremental-compilation.md b/src/incremental-compilation.md
@@ -0,0 +1,139 @@
+# Incremental compilation
+
+The incremental compilation scheme is, in essence, a surprisingly
+simple extension to the overall query system. We'll start by describing
+a slightly simplified variant of the real thing, the "basic algorithm", and then describe
+some possible improvements.
+
+## The basic algorithm
+
+The basic algorithm is
+called the **red-green** algorithm[^salsa]. The high-level idea is
+that, after each run of the compiler, we will save the results of all
+the queries that we do, as well as the **query DAG**. The
+**query DAG** is a [DAG] that indices which queries executed which
+other queries. So for example there would be an edge from a query Q1
+to another query Q2 if computing Q1 required computing Q2 (note that
+because queries cannot depend on themselves, this results in a DAG and
+not a general graph).
+
+[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
+
+On the next run of the compiler, then, we can sometimes reuse these
+query results to avoid re-executing a query. We do this by assigning
+every query a **color**:
+
+- If a query is colored **red**, that means that its result during
+  this compilation has **changed** from the previous compilation.
+- If a query is colored **green**, that means that its result is
+  the **same** as the previous compilation.
+
+There are two key insights here:
+
+- First, if all the inputs to query Q are colored green, then the
+  query Q **must** result in the same value as last time and hence
+  need not be re-executed (or else the compiler is not deterministic).
+- Second, even if some inputs to a query changes, it may be that it
+  **still** produces the same result as the previous compilation. In
+  particular, the query may only use part of its input.
+  - Therefore, after executing a query, we always check whether it
+    produced the same result as the previous time. **If it did,** we
+    can still mark the query as green, and hence avoid re-executing
+    dependent queries.
+    
+### The try-mark-green algorithm
+
+The core of the incremental compilation is an algorithm called
+"try-mark-green". It has the job of determining the color of a given
+query Q (which must not yet have been executed). In cases where Q has
+red inputs, determining Q's color may involve re-executing Q so that
+we can compare its output; but if all of Q's inputs are green, then we
+can determine that Q must be green without re-executing it or inspect
+its value what-so-ever. In the compiler, this allows us to avoid
+deserializing the result from disk when we don't need it, and -- in
+fact -- enables us to sometimes skip *serializing* the result as well
+(see the refinements section below).
+
+Try-mark-green works as follows:
+
+- First check if there is the query Q was executed during the previous
+  compilation.
+  - If not, we can just re-execute the query as normal, and assign it the
+    color of red.
+- If yes, then load the 'dependent queries' that Q 
+- If there is a saved result, then we load the `reads(Q)` vector from the
+  query DAG. The "reads" is the set of queries that Q executed during
+  its execution.
+  - For each query R that in `reads(Q)`, we recursively demand the color
+    of R using try-mark-green.
+    - Note: it is important that we visit each node in `reads(Q)` in same order
+      as they occurred in the original compilation. See [the section on the query DAG below](#dag).
+    - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
+      - We re-execute Q and compare the hash of its result to the hash of the result
+        from the previous compilation.
+      - If the hash has not changed, we can mark Q as **green** and return.
+    - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
+      we can color Q as **green** and return.
+
+<a name="dag">
+
+### The query DAG
+
+The query DAG code is stored in
+[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
+by instrumenting the query execution. 
+
+One key point is that the query DAG also tracks ordering; that is, for
+each query Q, we noy only track the queries that Q reads, we track the
+**order** in which they were read.  This allows try-mark-green to walk
+those queries back in the same order. This is important because once a subquery comes back as red,
+we can no longer be sure that Q will continue along the same path as before.
+That is, imagine a query like this:
+
+```rust,ignore
+fn main_query(tcx) {
+    if tcx.subquery1() {
+        tcx.subquery2()
+    } else {
+        tcx.subquery3()
+    }
+}
+```
+
+Now imagine that in the first compilation, `main_query` starts by
+executing `subquery1`, and this returns true. In that case, the next
+query `main_query` executes will be `subquery2`, and `subquery3` will
+not be executed at all.
+
+But now imagine that in the **next** compilation, the input has
+changed such that `subquery` returns **false**. In this case, `subquery2` would never
+execute. If try-mark-green were to visit `reads(main_query)` out of order,
+however, it might have visited `subquery2` before `subquery1`, and hence executed it.
+This can lead to ICEs and other problems in the compiler.
+
+[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
+
+## Improvements to the basic algorithm
+
+In the description basic algorithm, we said that at the end of
+compilation we would save the results of all the queries that were
+performed.  In practice, this can be quite wasteful -- many of those
+results are very cheap to recompute, and serializing + deserializing
+them is not a particular win. In practice, what we would do is to save
+**the hashes** of all the subqueries that we performed. Then, in select cases,
+we **also** save the results.
+
+This is why the incremental algorithm separates computing the
+**color** of a node, which often does not require its value, from
+computing the **result** of a node. Computing the result is done via a simple algorithm
+like so:
+
+- Check if a saved result for Q is available. If so, compute the color of Q.
+  If Q is green, deserialize and return the saved result.
+- Otherwise, execute Q.
+  - We can then compare the hash of the result and color Q as green if
+    it did not change.
+
+# Footnotes
+
+[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
diff --git a/src/mir.md b/src/mir.md
@@ -0,0 +1,6 @@
+# The MIR (Mid-level IR)
+
+TODO
+
+Defined in the `src/librustc/mir/` module, but much of the code that
+manipulates it is found in `src/librustc_mir`.
diff --git a/src/query.md b/src/query.md