Skip to content

Commit 688d1b0

Browse files
authored
Merge pull request rust-lang#28 from nikomatsakis/master
add query + incremental section and restructure a bit
2 parents 3b4fab4 + bf77592 commit 688d1b0

File tree

6 files changed

+478
-15
lines changed

6 files changed

+478
-15
lines changed

src/SUMMARY.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,19 @@
55
- [Using the compiler testing framework](./running-tests.md)
66
- [Walkthrough: a typical contribution](./walkthrough.md)
77
- [High-level overview of the compiler source](./high-level-overview.md)
8+
- [Queries: demand-driven compilation](./query.md)
9+
- [Incremental compilation](./incremental-compilation.md)
810
- [The parser](./the-parser.md)
911
- [Macro expansion](./macro-expansion.md)
1012
- [Name resolution](./name-resolution.md)
11-
- [HIR lowering](./hir-lowering.md)
13+
- [The HIR (High-level IR)](./hir.md)
1214
- [The `ty` module: representing types](./ty.md)
1315
- [Type inference](./type-inference.md)
1416
- [Trait resolution](./trait-resolution.md)
1517
- [Type checking](./type-checking.md)
16-
- [MIR construction](./mir-construction.md)
17-
- [MIR borrowck](./mir-borrowck.md)
18-
- [MIR optimizations](./mir-optimizations.md)
18+
- [The MIR (Mid-level IR)](./mir.md)
19+
- [MIR construction](./mir-construction.md)
20+
- [MIR borrowck](./mir-borrowck.md)
21+
- [MIR optimizations](./mir-optimizations.md)
1922
- [trans: generating LLVM IR](./trans.md)
2023
- [Glossary](./glossary.md)

src/glossary.md

+10-9
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,24 @@ AST | the abstract syntax tree produced by the syntax crate
99
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
1010
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
1111
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
12-
HIR | the High-level IR, created by lowering and desugaring the AST. See `librustc/hir`.
12+
HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
1313
HirId | identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
14-
'gcx | the lifetime of the global arena (see `librustc/ty`).
14+
'gcx | the lifetime of the global arena ([see more](ty.html))
1515
generics | the set of generic type parameters defined on a type or item
1616
ICE | internal compiler error. When the compiler crashes.
1717
infcx | the inference context (see `librustc/infer`)
18-
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans. Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is found in `src/librustc_mir`.
19-
obligation | something that must be proven by the trait system; see `librustc/traits`.
18+
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
19+
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
2020
local crate | the crate currently being compiled.
2121
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
22-
query | perhaps some sub-computation during compilation; see `librustc/maps`.
23-
provider | the function that executes a query; see `librustc/maps`.
22+
query | perhaps some sub-computation during compilation ([see more](query.html))
23+
provider | the function that executes a query ([see more](query.html))
2424
sess | the compiler session, which stores global data used throughout compilation
2525
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
2626
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
2727
substs | the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
28-
tcx | the "typing context", main data structure of the compiler (see `librustc/ty`).
28+
tcx | the "typing context", main data structure of the compiler ([see more](ty.html))
29+
'tcx | the lifetime of the currently active inference context ([see more](ty.html))
2930
trans | the code to translate MIR into LLVM IR.
30-
trait reference | a trait and values for its type parameters (see `librustc/ty`).
31-
ty | the internal representation of a type (see `librustc/ty`).
31+
trait reference | a trait and values for its type parameters ([see more](ty.html)).
32+
ty | the internal representation of a type ([see more](ty.html)).

src/hir-lowering.md renamed to src/hir.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# HIR lowering
1+
# The HIR
22

33
The HIR -- "High-level IR" -- is the primary IR used in most of
44
rustc. It is a desugared version of the "abstract syntax tree" (AST)
@@ -116,4 +116,4 @@ associated with an **owner**, which is typically some kind of item
116116
(e.g., a `fn()` or `const`), but could also be a closure expression
117117
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
118118
associated with a given def-id (`maybe_body_owned_by()`) or to find
119-
the owner of a body (`body_owner_def_id()`).
119+
the owner of a body (`body_owner_def_id()`).

src/incremental-compilation.md

+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Incremental compilation
2+
3+
The incremental compilation scheme is, in essence, a surprisingly
4+
simple extension to the overall query system. We'll start by describing
5+
a slightly simplified variant of the real thing, the "basic algorithm", and then describe
6+
some possible improvements.
7+
8+
## The basic algorithm
9+
10+
The basic algorithm is
11+
called the **red-green** algorithm[^salsa]. The high-level idea is
12+
that, after each run of the compiler, we will save the results of all
13+
the queries that we do, as well as the **query DAG**. The
14+
**query DAG** is a [DAG] that indices which queries executed which
15+
other queries. So for example there would be an edge from a query Q1
16+
to another query Q2 if computing Q1 required computing Q2 (note that
17+
because queries cannot depend on themselves, this results in a DAG and
18+
not a general graph).
19+
20+
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
21+
22+
On the next run of the compiler, then, we can sometimes reuse these
23+
query results to avoid re-executing a query. We do this by assigning
24+
every query a **color**:
25+
26+
- If a query is colored **red**, that means that its result during
27+
this compilation has **changed** from the previous compilation.
28+
- If a query is colored **green**, that means that its result is
29+
the **same** as the previous compilation.
30+
31+
There are two key insights here:
32+
33+
- First, if all the inputs to query Q are colored green, then the
34+
query Q **must** result in the same value as last time and hence
35+
need not be re-executed (or else the compiler is not deterministic).
36+
- Second, even if some inputs to a query changes, it may be that it
37+
**still** produces the same result as the previous compilation. In
38+
particular, the query may only use part of its input.
39+
- Therefore, after executing a query, we always check whether it
40+
produced the same result as the previous time. **If it did,** we
41+
can still mark the query as green, and hence avoid re-executing
42+
dependent queries.
43+
44+
### The try-mark-green algorithm
45+
46+
The core of the incremental compilation is an algorithm called
47+
"try-mark-green". It has the job of determining the color of a given
48+
query Q (which must not yet have been executed). In cases where Q has
49+
red inputs, determining Q's color may involve re-executing Q so that
50+
we can compare its output; but if all of Q's inputs are green, then we
51+
can determine that Q must be green without re-executing it or inspect
52+
its value what-so-ever. In the compiler, this allows us to avoid
53+
deserializing the result from disk when we don't need it, and -- in
54+
fact -- enables us to sometimes skip *serializing* the result as well
55+
(see the refinements section below).
56+
57+
Try-mark-green works as follows:
58+
59+
- First check if there is the query Q was executed during the previous
60+
compilation.
61+
- If not, we can just re-execute the query as normal, and assign it the
62+
color of red.
63+
- If yes, then load the 'dependent queries' that Q
64+
- If there is a saved result, then we load the `reads(Q)` vector from the
65+
query DAG. The "reads" is the set of queries that Q executed during
66+
its execution.
67+
- For each query R that in `reads(Q)`, we recursively demand the color
68+
of R using try-mark-green.
69+
- Note: it is important that we visit each node in `reads(Q)` in same order
70+
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
71+
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
72+
- We re-execute Q and compare the hash of its result to the hash of the result
73+
from the previous compilation.
74+
- If the hash has not changed, we can mark Q as **green** and return.
75+
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
76+
we can color Q as **green** and return.
77+
78+
<a name="dag">
79+
80+
### The query DAG
81+
82+
The query DAG code is stored in
83+
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
84+
by instrumenting the query execution.
85+
86+
One key point is that the query DAG also tracks ordering; that is, for
87+
each query Q, we noy only track the queries that Q reads, we track the
88+
**order** in which they were read. This allows try-mark-green to walk
89+
those queries back in the same order. This is important because once a subquery comes back as red,
90+
we can no longer be sure that Q will continue along the same path as before.
91+
That is, imagine a query like this:
92+
93+
```rust,ignore
94+
fn main_query(tcx) {
95+
if tcx.subquery1() {
96+
tcx.subquery2()
97+
} else {
98+
tcx.subquery3()
99+
}
100+
}
101+
```
102+
103+
Now imagine that in the first compilation, `main_query` starts by
104+
executing `subquery1`, and this returns true. In that case, the next
105+
query `main_query` executes will be `subquery2`, and `subquery3` will
106+
not be executed at all.
107+
108+
But now imagine that in the **next** compilation, the input has
109+
changed such that `subquery` returns **false**. In this case, `subquery2` would never
110+
execute. If try-mark-green were to visit `reads(main_query)` out of order,
111+
however, it might have visited `subquery2` before `subquery1`, and hence executed it.
112+
This can lead to ICEs and other problems in the compiler.
113+
114+
[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
115+
116+
## Improvements to the basic algorithm
117+
118+
In the description basic algorithm, we said that at the end of
119+
compilation we would save the results of all the queries that were
120+
performed. In practice, this can be quite wasteful -- many of those
121+
results are very cheap to recompute, and serializing + deserializing
122+
them is not a particular win. In practice, what we would do is to save
123+
**the hashes** of all the subqueries that we performed. Then, in select cases,
124+
we **also** save the results.
125+
126+
This is why the incremental algorithm separates computing the
127+
**color** of a node, which often does not require its value, from
128+
computing the **result** of a node. Computing the result is done via a simple algorithm
129+
like so:
130+
131+
- Check if a saved result for Q is available. If so, compute the color of Q.
132+
If Q is green, deserialize and return the saved result.
133+
- Otherwise, execute Q.
134+
- We can then compare the hash of the result and color Q as green if
135+
it did not change.
136+
137+
# Footnotes
138+
139+
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

src/mir.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# The MIR (Mid-level IR)
2+
3+
TODO
4+
5+
Defined in the `src/librustc/mir/` module, but much of the code that
6+
manipulates it is found in `src/librustc_mir`.

0 commit comments

Comments
 (0)