Skip to content

Commit 20bd24b

Browse files
authored
Merge pull request #79 from matthewhammer/master
Profiling queries
2 parents dab3e6a + afd9595 commit 20bd24b

File tree

4 files changed

+249419
-0
lines changed

4 files changed

+249419
-0
lines changed

profile-queries.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Profiling Queries
2+
3+
In an effort to support _incremental compilation_, the latest design of
4+
the Rust compiler consists of a _query-based_ model.
5+
6+
The details of this model are (currently) outside the scope of this
7+
document, however, we explain
8+
[some background of this model](#background), in an effort to explain
9+
how we profile its performance. We intend this profiling effort to address
10+
[issue 42678](https://github.com/rust-lang/rust/issues/42678).
11+
12+
## Quick Start
13+
14+
### 1. Compile `rustc`
15+
Compile the compiler, as usual:
16+
17+
```
18+
python x.py --stage 1
19+
```
20+
21+
### 2. Run `rustc`, with flags
22+
Run the compiler on a source file, supplying two additional debugging flags with `-Z`:
23+
24+
```
25+
rustc -Z profile-queries -Z dump-dep-graph foo.rs
26+
```
27+
28+
Regarding the two additional parameters:
29+
30+
- `-Z profile-queries` tells the compiler to run a separate thread
31+
that profiles the queries made by the main compiler thread(s).
32+
- `-Z dump-dep-graph` tells the compiler to "dump" various files that
33+
describe the compilation dependencies.
34+
35+
This command will generate the following files:
36+
37+
- `profile_queries.html` consists of an HTML-based representation of
38+
the [trace of queries](#trace-of-queries).
39+
- `profile_queries.counts.txt` consists of a histogram, where each histogram "bucket" is a query provider.
40+
41+
### 3. Inspect the output
42+
3(a). Open the HTML file (`profile_queries.html`) with a browser. See [this section](#interpret-the-html-output) for an explanation of this file.
43+
44+
3(b). Open the data file (`profile_queries.counts.txt`) with a text editor, or spreadsheet. See [this section](#interpret-the-data-output) for an explanation of this file.
45+
46+
3(c). Older stuff, also generated as output (you can _ignore these files_; we won't discuss them further here):
47+
48+
- `dep_graph.dot` consists of old stuff: a representation of dependencies that are _outside_ the newer query model.
49+
- `dep_graph.txt` consists of old stuff: a representation of dependencies that are _outside_ the newer query model.
50+
51+
52+
## Interpret the HTML Output
53+
54+
### Example 0
55+
56+
The following image gives some example output, from tracing the
57+
queries of `hello_world.rs` (a single `main` function, that prints
58+
`"hello world"` via the macro `println!`). This image only shows a
59+
short prefix of the total output; the _actual_ output is much longer.
60+
61+
![](profile-queries/example0.png)
62+
63+
[Full HTML output](profile-queries/example0/profile_queries.html)
64+
65+
### Example 0 explanation
66+
67+
The trace of the queries has a formal structure; see
68+
[Trace of Queries](#trace-of-queries) for details.
69+
70+
We style this formal structure as follows:
71+
72+
- Blue dots represent query hits. They consist of leaves in the
73+
trace's tree. (CSS class: `hit`).
74+
- Red boxes represent query misses. They consist of internal nodes in
75+
the trace's tree. (CSS class: `miss`).
76+
- Many red boxes contain _nested boxes and dots_. This nesting structure
77+
reflects that some providers _depend on_ results from other
78+
providers, which consist of their nested children.
79+
- For example, the red box labeled as `typeck_tables_of` depends
80+
on the one labeled `adt_dtorck_constraint`, which itself
81+
depends on one labeled `coherent_trait`.
82+
- Some red boxes are _labeled_ with text, and have highlighted borders
83+
(light red, and bolded). (See [heuristics](#heuristics) for
84+
details). Where they are present, the labels give the following
85+
information:
86+
- The [query's _provider_](#queries), sans its _key_ and its _result_,
87+
which are often too long to include in these labels.
88+
- The _duration_ of the provider, as a fraction of the total time
89+
(for the entire trace). This fraction includes the query's
90+
entire extent (that is, the sum total of all of its
91+
sub-queries).
92+
93+
94+
## Heuristics
95+
96+
Heuristics-based CSS Classes:
97+
98+
- `important` -- Trace nodes are `important` if they have an extent of
99+
6 (or more), _or_ they have a duration fraction of one percent (or
100+
more). These numbers are simple hueristics (currently hard-coded,
101+
but easy to modify). Important nodes are styled with textual
102+
labels, and highlighted borders (light red, and bolded).
103+
104+
## Interpret the Data Output
105+
106+
The file `profile_queries.counts.txt` contains a table of information
107+
about the queries, organized around their providers.
108+
109+
For each provider, we produce:
110+
111+
- A total **count** --- the total number of times this provider was
112+
queried
113+
114+
- A total **duration** --- the total number of seconds spent running
115+
this provider, _including_ all providers it may depend on. To get a
116+
sense of this dependency structure, and inspect a more fine-grained
117+
view of these durations, see
118+
[this section](#interpret-the-html-output).
119+
120+
These rows are **sorted by total duration**, in descending order.
121+
122+
### Counts: Example 0
123+
124+
The following example `profile_queries.counts.txt` file results from
125+
running on a hello world program (a single main function that uses
126+
`println` to print `"hellow world").
127+
128+
As explained above, the columns consist of `provider`, `count`, `duration`:
129+
130+
```
131+
symbol_name,2441,0.362
132+
def_symbol_name,2414,0.129
133+
item_attrs,5300,0.060
134+
type_of,4841,0.059
135+
generics_of,7216,0.049
136+
impl_trait_ref,2898,0.037
137+
def_span,20381,0.030
138+
adt_def,1142,0.028
139+
is_foreign_item,2425,0.021
140+
adt_dtorck_constraint,2,0.016
141+
typeck_tables_of,33,0.014
142+
typeck_item_bodies,1,0.010
143+
coherent_trait,7,0.008
144+
adt_destructor,10,0.008
145+
borrowck,4,0.008
146+
mir_validated,4,0.007
147+
impl_parent,306,0.003
148+
trait_def,216,0.001
149+
mir_const,2,0.001
150+
optimized_mir,6,0.000
151+
adt_sized_constraint,9,0.000
152+
predicates_of,82,0.000
153+
privacy_access_levels,5,0.000
154+
impl_polarity,15,0.000
155+
trait_of_item,7,0.000
156+
region_maps,11,0.000
157+
associated_item_def_ids,7,0.000
158+
super_predicates_of,8,0.000
159+
variances_of,12,0.000
160+
crate_inherent_impls,1,0.000
161+
is_exported_symbol,2,0.000
162+
associated_item,3,0.000
163+
crate_inherent_impls_overlap_check,1,0.000
164+
reachable_set,1,0.000
165+
is_mir_available,1,0.000
166+
inherent_impls,1,0.000
167+
```
168+
169+
# Background
170+
171+
We give some background about the query model of the Rust compiler.
172+
173+
## Def IDs
174+
175+
In the query model, many queries have a key that consists of a Def ID.
176+
The Rust compiler uses Def IDs to distinguish definitions in the input
177+
Rust program.
178+
179+
From the compiler source code (`src/librustc/hir/def_id.rs`):
180+
181+
```
182+
/// A DefId identifies a particular *definition*, by combining a crate
183+
/// index and a def index.
184+
#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)]
185+
pub struct DefId {
186+
pub krate: CrateNum,
187+
pub index: DefIndex,
188+
}
189+
```
190+
191+
## Queries
192+
193+
A query relates a _key_ to a _result_, either by invoking a _provider_
194+
that computes this result, or by reusing a cached result that was
195+
provided earlier. We explain each term in more detail:
196+
197+
- Query **Provider**: Each kind of query has a pre-defined _provider_,
198+
which refers to the compiler behavior that provides an answer to the
199+
query. These providers may nest; see [trace of
200+
queries](#trace-of-queries) for more information about this nesting structure.
201+
_Example providers:_
202+
- `typeck_tables_of` -- Typecheck a Def ID; produce "tables" of type information.
203+
- `borrowck` -- Borrow-check a Def ID.
204+
- `optimized_mir` -- Generate an optimized MIR for a Def ID; produce MIR.
205+
- For more examples, see [Example 0](#counts-example-0).
206+
- Query **Key**: The input/arguments to the provider. Often, this
207+
consists of a particular [Def ID](#def-ids).
208+
- Query **Result**: The output of the provider.
209+
210+
## Trace of Queries
211+
212+
Formally, a _trace_ of the queries consists of a _tree_, where
213+
sub-trees represent sub-traces. In particular, the nesting structure
214+
of the trace of queries describes how the queries depend on one
215+
another.
216+
217+
Even more precisely, this tree represents a directed acyclic graph
218+
(DAG), where shared sub-graphs consist of tree nodes that occur
219+
multiple times in the tree, first as "cache misses" and later as
220+
"cache hits".
221+
222+
**Cache hits and misses.**
223+
The trace is a tree with the following possible tree nodes:
224+
225+
- Query, with cache **miss**: The query's result is **unknown**, and
226+
its provider runs to compute it. In this case, the dynamic extent
227+
of the query's trace consists of the traced behavior of its
228+
provider.
229+
- Query, with cache **hit**: The query's result is **known**, and is
230+
reused; its provider does not rerun. These nodes are leaves in the
231+
trace, since they have no dynamic extent. These leaves also
232+
represent where the tree, represented as a DAG, would _share_ a
233+
sub-graph (namely, the sub-graph of the query that was reused from
234+
the cache).
235+
236+
**Tree node metrics.**
237+
To help determine how to style this tree, we define the following tree
238+
node metrics:
239+
240+
- **Depth**: The number of **ancestors** of the node in its path from the tree root.
241+
- **Extent**: The number of **immediate children** of the node.
242+
243+
Intuitively, a dependency tree is "good" for incremental caching when
244+
the depth and extent of each node is relatively small. It is
245+
pathological when either of these metrics grows too large. For
246+
instance, a tree node whose extent consists of 1M immediate children
247+
means that if and when this node is re-computed, all 1M children must
248+
be re-queried, at the very least (some may also require recomputation,
249+
too).
250+
251+
## Links
252+
253+
- https://internals.rust-lang.org/t/incremental-compilation-beta/4721
254+
- https://blog.rust-lang.org/2016/09/08/incremental.html

profile-queries/example0.png

56.2 KB
Loading
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
symbol_name,2441,0.362
2+
def_symbol_name,2414,0.129
3+
item_attrs,5300,0.060
4+
type_of,4841,0.059
5+
generics_of,7216,0.049
6+
impl_trait_ref,2898,0.037
7+
def_span,20381,0.030
8+
adt_def,1142,0.028
9+
is_foreign_item,2425,0.021
10+
adt_dtorck_constraint,2,0.016
11+
typeck_tables_of,33,0.014
12+
typeck_item_bodies,1,0.010
13+
coherent_trait,7,0.008
14+
adt_destructor,10,0.008
15+
borrowck,4,0.008
16+
mir_validated,4,0.007
17+
impl_parent,306,0.003
18+
trait_def,216,0.001
19+
mir_const,2,0.001
20+
optimized_mir,6,0.000
21+
adt_sized_constraint,9,0.000
22+
predicates_of,82,0.000
23+
privacy_access_levels,5,0.000
24+
impl_polarity,15,0.000
25+
trait_of_item,7,0.000
26+
region_maps,11,0.000
27+
associated_item_def_ids,7,0.000
28+
super_predicates_of,8,0.000
29+
variances_of,12,0.000
30+
crate_inherent_impls,1,0.000
31+
is_exported_symbol,2,0.000
32+
associated_item,3,0.000
33+
crate_inherent_impls_overlap_check,1,0.000
34+
reachable_set,1,0.000
35+
is_mir_available,1,0.000
36+
inherent_impls,1,0.000

0 commit comments

Comments
 (0)