|
| 1 | +# Incremental compilation |
| 2 | + |
| 3 | +The incremental compilation scheme is, in essence, a surprisingly |
| 4 | +simple extension to the overall query system. We'll start by describing |
| 5 | +a slightly simplified variant of the real thing, the "basic algorithm", and then describe |
| 6 | +some possible improvements. |
| 7 | + |
| 8 | +## The basic algorithm |
| 9 | + |
| 10 | +The basic algorithm is |
| 11 | +called the **red-green** algorithm[^salsa]. The high-level idea is |
| 12 | +that, after each run of the compiler, we will save the results of all |
| 13 | +the queries that we do, as well as the **query DAG**. The |
| 14 | +**query DAG** is a [DAG] that indices which queries executed which |
| 15 | +other queries. So for example there would be an edge from a query Q1 |
| 16 | +to another query Q2 if computing Q1 required computing Q2 (note that |
| 17 | +because queries cannot depend on themselves, this results in a DAG and |
| 18 | +not a general graph). |
| 19 | + |
| 20 | +[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph |
| 21 | + |
| 22 | +On the next run of the compiler, then, we can sometimes reuse these |
| 23 | +query results to avoid re-executing a query. We do this by assigning |
| 24 | +every query a **color**: |
| 25 | + |
| 26 | +- If a query is colored **red**, that means that its result during |
| 27 | + this compilation has **changed** from the previous compilation. |
| 28 | +- If a query is colored **green**, that means that its result is |
| 29 | + the **same** as the previous compilation. |
| 30 | + |
| 31 | +There are two key insights here: |
| 32 | + |
| 33 | +- First, if all the inputs to query Q are colored green, then the |
| 34 | + query Q **must** result in the same value as last time and hence |
| 35 | + need not be re-executed (or else the compiler is not deterministic). |
| 36 | +- Second, even if some inputs to a query changes, it may be that it |
| 37 | + **still** produces the same result as the previous compilation. In |
| 38 | + particular, the query may only use part of its input. |
| 39 | + - Therefore, after executing a query, we always check whether it |
| 40 | + produced the same result as the previous time. **If it did,** we |
| 41 | + can still mark the query as green, and hence avoid re-executing |
| 42 | + dependent queries. |
| 43 | + |
| 44 | +### The try-mark-green algorithm |
| 45 | + |
| 46 | +The core of the incremental compilation is an algorithm called |
| 47 | +"try-mark-green". It has the job of determining the color of a given |
| 48 | +query Q (which must not yet have been executed). In cases where Q has |
| 49 | +red inputs, determining Q's color may involve re-executing Q so that |
| 50 | +we can compare its output; but if all of Q's inputs are green, then we |
| 51 | +can determine that Q must be green without re-executing it or inspect |
| 52 | +its value what-so-ever. In the compiler, this allows us to avoid |
| 53 | +deserializing the result from disk when we don't need it, and -- in |
| 54 | +fact -- enables us to sometimes skip *serializing* the result as well |
| 55 | +(see the refinements section below). |
| 56 | + |
| 57 | +Try-mark-green works as follows: |
| 58 | + |
| 59 | +- First check if there is the query Q was executed during the previous |
| 60 | + compilation. |
| 61 | + - If not, we can just re-execute the query as normal, and assign it the |
| 62 | + color of red. |
| 63 | +- If yes, then load the 'dependent queries' that Q |
| 64 | +- If there is a saved result, then we load the `reads(Q)` vector from the |
| 65 | + query DAG. The "reads" is the set of queries that Q executed during |
| 66 | + its execution. |
| 67 | + - For each query R that in `reads(Q)`, we recursively demand the color |
| 68 | + of R using try-mark-green. |
| 69 | + - Note: it is important that we visit each node in `reads(Q)` in same order |
| 70 | + as they occurred in the original compilation. See [the section on the query DAG below](#dag). |
| 71 | + - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty. |
| 72 | + - We re-execute Q and compare the hash of its result to the hash of the result |
| 73 | + from the previous compilation. |
| 74 | + - If the hash has not changed, we can mark Q as **green** and return. |
| 75 | + - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case, |
| 76 | + we can color Q as **green** and return. |
| 77 | + |
| 78 | +<a name="dag"> |
| 79 | + |
| 80 | +### The query DAG |
| 81 | + |
| 82 | +The query DAG code is stored in |
| 83 | +[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done |
| 84 | +by instrumenting the query execution. |
| 85 | + |
| 86 | +One key point is that the query DAG also tracks ordering; that is, for |
| 87 | +each query Q, we noy only track the queries that Q reads, we track the |
| 88 | +**order** in which they were read. This allows try-mark-green to walk |
| 89 | +those queries back in the same order. This is important because once a subquery comes back as red, |
| 90 | +we can no longer be sure that Q will continue along the same path as before. |
| 91 | +That is, imagine a query like this: |
| 92 | + |
| 93 | +```rust,ignore |
| 94 | +fn main_query(tcx) { |
| 95 | + if tcx.subquery1() { |
| 96 | + tcx.subquery2() |
| 97 | + } else { |
| 98 | + tcx.subquery3() |
| 99 | + } |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +Now imagine that in the first compilation, `main_query` starts by |
| 104 | +executing `subquery1`, and this returns true. In that case, the next |
| 105 | +query `main_query` executes will be `subquery2`, and `subquery3` will |
| 106 | +not be executed at all. |
| 107 | + |
| 108 | +But now imagine that in the **next** compilation, the input has |
| 109 | +changed such that `subquery` returns **false**. In this case, `subquery2` would never |
| 110 | +execute. If try-mark-green were to visit `reads(main_query)` out of order, |
| 111 | +however, it might have visited `subquery2` before `subquery1`, and hence executed it. |
| 112 | +This can lead to ICEs and other problems in the compiler. |
| 113 | + |
| 114 | +[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph |
| 115 | + |
| 116 | +## Improvements to the basic algorithm |
| 117 | + |
| 118 | +In the description basic algorithm, we said that at the end of |
| 119 | +compilation we would save the results of all the queries that were |
| 120 | +performed. In practice, this can be quite wasteful -- many of those |
| 121 | +results are very cheap to recompute, and serializing + deserializing |
| 122 | +them is not a particular win. In practice, what we would do is to save |
| 123 | +**the hashes** of all the subqueries that we performed. Then, in select cases, |
| 124 | +we **also** save the results. |
| 125 | + |
| 126 | +This is why the incremental algorithm separates computing the |
| 127 | +**color** of a node, which often does not require its value, from |
| 128 | +computing the **result** of a node. Computing the result is done via a simple algorithm |
| 129 | +like so: |
| 130 | + |
| 131 | +- Check if a saved result for Q is available. If so, compute the color of Q. |
| 132 | + If Q is green, deserialize and return the saved result. |
| 133 | +- Otherwise, execute Q. |
| 134 | + - We can then compare the hash of the result and color Q as green if |
| 135 | + it did not change. |
| 136 | + |
| 137 | +# Footnotes |
| 138 | + |
| 139 | +[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis |
0 commit comments