|
| 1 | +# Profiling Queries |
| 2 | + |
| 3 | +In an effort to support _incremental compilation_, the latest design of |
| 4 | +the Rust compiler consists of a _query-based_ model. |
| 5 | + |
| 6 | +The details of this model are (currently) outside the scope of this |
| 7 | +document, however, we explain |
| 8 | +[some background of this model](#background), in an effort to explain |
| 9 | +how we profile its performance. We intend this profiling effort to address |
| 10 | +[issue 42678](https://github.com/rust-lang/rust/issues/42678). |
| 11 | + |
| 12 | +## Quick Start |
| 13 | + |
| 14 | +### 1. Compile `rustc` |
| 15 | +Compile the compiler, as usual: |
| 16 | + |
| 17 | +``` |
| 18 | +python x.py --stage 1 |
| 19 | +``` |
| 20 | + |
| 21 | +### 2. Run `rustc`, with flags |
| 22 | +Run the compiler on a source file, supplying two additional debugging flags with `-Z`: |
| 23 | + |
| 24 | +``` |
| 25 | +rustc -Z profile-queries -Z dump-dep-graph foo.rs |
| 26 | +``` |
| 27 | + |
| 28 | +Regarding the two additional parameters: |
| 29 | + |
| 30 | +- `-Z profile-queries` tells the compiler to run a separate thread |
| 31 | + that profiles the queries made by the main compiler thread(s). |
| 32 | +- `-Z dump-dep-graph` tells the compiler to "dump" various files that |
| 33 | + describe the compilation dependencies. |
| 34 | + |
| 35 | +This command will generate the following files: |
| 36 | + |
| 37 | +- `profile_queries.html` consists of an HTML-based representation of |
| 38 | + the [trace of queries](#trace-of-queries). |
| 39 | +- `profile_queries.counts.txt` consists of a histogram, where each histogram "bucket" is a query provider. |
| 40 | + |
| 41 | +### 3. Inspect the output |
| 42 | +3(a). Open the HTML file (`profile_queries.html`) with a browser. See [this section](#interpret-the-html-output) for an explanation of this file. |
| 43 | + |
| 44 | +3(b). Open the data file (`profile_queries.counts.txt`) with a text editor, or spreadsheet. See [this section](#interpret-the-data-output) for an explanation of this file. |
| 45 | + |
| 46 | +3(c). Older stuff, also generated as output (you can _ignore these files_; we won't discuss them further here): |
| 47 | + |
| 48 | +- `dep_graph.dot` consists of old stuff: a representation of dependencies that are _outside_ the newer query model. |
| 49 | +- `dep_graph.txt` consists of old stuff: a representation of dependencies that are _outside_ the newer query model. |
| 50 | + |
| 51 | + |
| 52 | +## Interpret the HTML Output |
| 53 | + |
| 54 | +### Example 0 |
| 55 | + |
| 56 | +The following image gives some example output, from tracing the |
| 57 | +queries of `hello_world.rs` (a single `main` function, that prints |
| 58 | +`"hello world"` via the macro `println!`). This image only shows a |
| 59 | +short prefix of the total output; the _actual_ output is much longer. |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | +[Full HTML output](profile-queries/example0/profile_queries.html) |
| 64 | + |
| 65 | +### Example 0 explanation |
| 66 | + |
| 67 | +The trace of the queries has a formal structure; see |
| 68 | +[Trace of Queries](#trace-of-queries) for details. |
| 69 | + |
| 70 | +We style this formal structure as follows: |
| 71 | + |
| 72 | +- Blue dots represent query hits. They consist of leaves in the |
| 73 | + trace's tree. (CSS class: `hit`). |
| 74 | +- Red boxes represent query misses. They consist of internal nodes in |
| 75 | + the trace's tree. (CSS class: `miss`). |
| 76 | +- Many red boxes contain _nested boxes and dots_. This nesting structure |
| 77 | + reflects that some providers _depend on_ results from other |
| 78 | + providers, which consist of their nested children. |
| 79 | + - For example, the red box labeled as `typeck_tables_of` depends |
| 80 | + on the one labeled `adt_dtorck_constraint`, which itself |
| 81 | + depends on one labeled `coherent_trait`. |
| 82 | +- Some red boxes are _labeled_ with text, and have highlighted borders |
| 83 | + (light red, and bolded). (See [heuristics](#heuristics) for |
| 84 | + details). Where they are present, the labels give the following |
| 85 | + information: |
| 86 | + - The [query's _provider_](#queries), sans its _key_ and its _result_, |
| 87 | + which are often too long to include in these labels. |
| 88 | + - The _duration_ of the provider, as a fraction of the total time |
| 89 | + (for the entire trace). This fraction includes the query's |
| 90 | + entire extent (that is, the sum total of all of its |
| 91 | + sub-queries). |
| 92 | + |
| 93 | + |
| 94 | +## Heuristics |
| 95 | + |
| 96 | +Heuristics-based CSS Classes: |
| 97 | + |
| 98 | +- `important` -- Trace nodes are `important` if they have an extent of |
| 99 | + 6 (or more), _or_ they have a duration fraction of one percent (or |
| 100 | + more). These numbers are simple hueristics (currently hard-coded, |
| 101 | + but easy to modify). Important nodes are styled with textual |
| 102 | + labels, and highlighted borders (light red, and bolded). |
| 103 | + |
| 104 | +## Interpret the Data Output |
| 105 | + |
| 106 | +The file `profile_queries.counts.txt` contains a table of information |
| 107 | +about the queries, organized around their providers. |
| 108 | + |
| 109 | +For each provider, we produce: |
| 110 | + |
| 111 | +- A total **count** --- the total number of times this provider was |
| 112 | + queried |
| 113 | + |
| 114 | +- A total **duration** --- the total number of seconds spent running |
| 115 | + this provider, _including_ all providers it may depend on. To get a |
| 116 | + sense of this dependency structure, and inspect a more fine-grained |
| 117 | + view of these durations, see |
| 118 | + [this section](#interpret-the-html-output). |
| 119 | + |
| 120 | +These rows are **sorted by total duration**, in descending order. |
| 121 | + |
| 122 | +### Counts: Example 0 |
| 123 | + |
| 124 | +The following example `profile_queries.counts.txt` file results from |
| 125 | +running on a hello world program (a single main function that uses |
| 126 | +`println` to print `"hellow world"). |
| 127 | + |
| 128 | +As explained above, the columns consist of `provider`, `count`, `duration`: |
| 129 | + |
| 130 | +``` |
| 131 | +symbol_name,2441,0.362 |
| 132 | +def_symbol_name,2414,0.129 |
| 133 | +item_attrs,5300,0.060 |
| 134 | +type_of,4841,0.059 |
| 135 | +generics_of,7216,0.049 |
| 136 | +impl_trait_ref,2898,0.037 |
| 137 | +def_span,20381,0.030 |
| 138 | +adt_def,1142,0.028 |
| 139 | +is_foreign_item,2425,0.021 |
| 140 | +adt_dtorck_constraint,2,0.016 |
| 141 | +typeck_tables_of,33,0.014 |
| 142 | +typeck_item_bodies,1,0.010 |
| 143 | +coherent_trait,7,0.008 |
| 144 | +adt_destructor,10,0.008 |
| 145 | +borrowck,4,0.008 |
| 146 | +mir_validated,4,0.007 |
| 147 | +impl_parent,306,0.003 |
| 148 | +trait_def,216,0.001 |
| 149 | +mir_const,2,0.001 |
| 150 | +optimized_mir,6,0.000 |
| 151 | +adt_sized_constraint,9,0.000 |
| 152 | +predicates_of,82,0.000 |
| 153 | +privacy_access_levels,5,0.000 |
| 154 | +impl_polarity,15,0.000 |
| 155 | +trait_of_item,7,0.000 |
| 156 | +region_maps,11,0.000 |
| 157 | +associated_item_def_ids,7,0.000 |
| 158 | +super_predicates_of,8,0.000 |
| 159 | +variances_of,12,0.000 |
| 160 | +crate_inherent_impls,1,0.000 |
| 161 | +is_exported_symbol,2,0.000 |
| 162 | +associated_item,3,0.000 |
| 163 | +crate_inherent_impls_overlap_check,1,0.000 |
| 164 | +reachable_set,1,0.000 |
| 165 | +is_mir_available,1,0.000 |
| 166 | +inherent_impls,1,0.000 |
| 167 | +``` |
| 168 | + |
| 169 | +# Background |
| 170 | + |
| 171 | +We give some background about the query model of the Rust compiler. |
| 172 | + |
| 173 | +## Def IDs |
| 174 | + |
| 175 | +In the query model, many queries have a key that consists of a Def ID. |
| 176 | +The Rust compiler uses Def IDs to distinguish definitions in the input |
| 177 | +Rust program. |
| 178 | + |
| 179 | +From the compiler source code (`src/librustc/hir/def_id.rs`): |
| 180 | + |
| 181 | +``` |
| 182 | +/// A DefId identifies a particular *definition*, by combining a crate |
| 183 | +/// index and a def index. |
| 184 | +#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)] |
| 185 | +pub struct DefId { |
| 186 | + pub krate: CrateNum, |
| 187 | + pub index: DefIndex, |
| 188 | +} |
| 189 | +``` |
| 190 | + |
| 191 | +## Queries |
| 192 | + |
| 193 | +A query relates a _key_ to a _result_, either by invoking a _provider_ |
| 194 | +that computes this result, or by reusing a cached result that was |
| 195 | +provided earlier. We explain each term in more detail: |
| 196 | + |
| 197 | +- Query **Provider**: Each kind of query has a pre-defined _provider_, |
| 198 | + which refers to the compiler behavior that provides an answer to the |
| 199 | + query. These providers may nest; see [trace of |
| 200 | + queries](#trace-of-queries) for more information about this nesting structure. |
| 201 | + _Example providers:_ |
| 202 | + - `typeck_tables_of` -- Typecheck a Def ID; produce "tables" of type information. |
| 203 | + - `borrowck` -- Borrow-check a Def ID. |
| 204 | + - `optimized_mir` -- Generate an optimized MIR for a Def ID; produce MIR. |
| 205 | + - For more examples, see [Example 0](#counts-example-0). |
| 206 | +- Query **Key**: The input/arguments to the provider. Often, this |
| 207 | + consists of a particular [Def ID](#def-ids). |
| 208 | +- Query **Result**: The output of the provider. |
| 209 | + |
| 210 | +## Trace of Queries |
| 211 | + |
| 212 | +Formally, a _trace_ of the queries consists of a _tree_, where |
| 213 | +sub-trees represent sub-traces. In particular, the nesting structure |
| 214 | +of the trace of queries describes how the queries depend on one |
| 215 | +another. |
| 216 | + |
| 217 | +Even more precisely, this tree represents a directed acyclic graph |
| 218 | +(DAG), where shared sub-graphs consist of tree nodes that occur |
| 219 | +multiple times in the tree, first as "cache misses" and later as |
| 220 | +"cache hits". |
| 221 | + |
| 222 | +**Cache hits and misses.** |
| 223 | +The trace is a tree with the following possible tree nodes: |
| 224 | + |
| 225 | +- Query, with cache **miss**: The query's result is **unknown**, and |
| 226 | + its provider runs to compute it. In this case, the dynamic extent |
| 227 | + of the query's trace consists of the traced behavior of its |
| 228 | + provider. |
| 229 | +- Query, with cache **hit**: The query's result is **known**, and is |
| 230 | + reused; its provider does not rerun. These nodes are leaves in the |
| 231 | + trace, since they have no dynamic extent. These leaves also |
| 232 | + represent where the tree, represented as a DAG, would _share_ a |
| 233 | + sub-graph (namely, the sub-graph of the query that was reused from |
| 234 | + the cache). |
| 235 | + |
| 236 | +**Tree node metrics.** |
| 237 | +To help determine how to style this tree, we define the following tree |
| 238 | +node metrics: |
| 239 | + |
| 240 | +- **Depth**: The number of **ancestors** of the node in its path from the tree root. |
| 241 | +- **Extent**: The number of **immediate children** of the node. |
| 242 | + |
| 243 | +Intuitively, a dependency tree is "good" for incremental caching when |
| 244 | +the depth and extent of each node is relatively small. It is |
| 245 | +pathological when either of these metrics grows too large. For |
| 246 | +instance, a tree node whose extent consists of 1M immediate children |
| 247 | +means that if and when this node is re-computed, all 1M children must |
| 248 | +be re-queried, at the very least (some may also require recomputation, |
| 249 | +too). |
| 250 | + |
| 251 | +## Links |
| 252 | + |
| 253 | +- https://internals.rust-lang.org/t/incremental-compilation-beta/4721 |
| 254 | +- https://blog.rust-lang.org/2016/09/08/incremental.html |
0 commit comments