|
1 | 1 | # HIR lowering
|
| 2 | + |
| 3 | +The HIR -- "High-level IR" -- is the primary IR used in most of |
| 4 | +rustc. It is a desugared version of the "abstract syntax tree" (AST) |
| 5 | +that is generated after parsing, macro expansion, and name resolution |
| 6 | +have completed. Many parts of HIR resemble Rust surface syntax quite |
| 7 | +closely, with the exception that some of Rust's expression forms have |
| 8 | +been desugared away (as an example, `for` loops are converted into a |
| 9 | +`loop` and do not appear in the HIR). |
| 10 | + |
| 11 | +This chapter covers the main concepts of the HIR. |
| 12 | + |
| 13 | +### Out-of-band storage and the `Crate` type |
| 14 | + |
| 15 | +The top-level data-structure in the HIR is the `Crate`, which stores |
| 16 | +the contents of the crate currently being compiled (we only ever |
| 17 | +construct HIR for the current crate). Whereas in the AST the crate |
| 18 | +data structure basically just contains the root module, the HIR |
| 19 | +`Crate` structure contains a number of maps and other things that |
| 20 | +serve to organize the content of the crate for easier access. |
| 21 | + |
| 22 | +For example, the contents of individual items (e.g., modules, |
| 23 | +functions, traits, impls, etc) in the HIR are not immediately |
| 24 | +accessible in the parents. So, for example, if had a module item `foo` |
| 25 | +containing a function `bar()`: |
| 26 | + |
| 27 | +``` |
| 28 | +mod foo { |
| 29 | + fn bar() { } |
| 30 | +} |
| 31 | +``` |
| 32 | + |
| 33 | +Then in the HIR the representation of module `foo` (the `Mod` |
| 34 | +stuct) would have only the **`ItemId`** `I` of `bar()`. To get the |
| 35 | +details of the function `bar()`, we would lookup `I` in the |
| 36 | +`items` map. |
| 37 | + |
| 38 | +One nice result from this representation is that one can iterate |
| 39 | +over all items in the crate by iterating over the key-value pairs |
| 40 | +in these maps (without the need to trawl through the IR in total). |
| 41 | +There are similar maps for things like trait items and impl items, |
| 42 | +as well as "bodies" (explained below). |
| 43 | + |
| 44 | +The other reason to setup the representation this way is for better |
| 45 | +integration with incremental compilation. This way, if you gain access |
| 46 | +to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately |
| 47 | +gain access to the contents of the function `bar()`. Instead, you only |
| 48 | +gain access to the **id** for `bar()`, and you must invoke some |
| 49 | +function to lookup the contents of `bar()` given its id; this gives us |
| 50 | +a chance to observe that you accessed the data for `bar()` and record |
| 51 | +the dependency. |
| 52 | + |
| 53 | +### Identifiers in the HIR |
| 54 | + |
| 55 | +Most of the code that has to deal with things in HIR tends not to |
| 56 | +carry around references into the HIR, but rather to carry around |
| 57 | +*identifier numbers* (or just "ids"). Right now, you will find four |
| 58 | +sorts of identifiers in active use: |
| 59 | + |
| 60 | +- `DefId`, which primarily names "definitions" or top-level items. |
| 61 | + - You can think of a `DefId` as being shorthand for a very explicit |
| 62 | + and complete path, like `std::collections::HashMap`. However, |
| 63 | + these paths are able to name things that are not nameable in |
| 64 | + normal Rust (e.g., impls), and they also include extra information |
| 65 | + about the crate (such as its version number, as two versions of |
| 66 | + the same crate can co-exist). |
| 67 | + - A `DefId` really consists of two parts, a `CrateNum` (which |
| 68 | + identifies the crate) and a `DefIndex` (which indixes into a list |
| 69 | + of items that is maintained per crate). |
| 70 | +- `HirId`, which combines the index of a particular item with an |
| 71 | + offset within that item. |
| 72 | + - the key point of a `HirId` is that it is *relative* to some item (which is named |
| 73 | + via a `DefId`). |
| 74 | +- `BodyId`, this is an absolute identifier that refers to a specific |
| 75 | + body (definition of a function or constant) in the crate. It is currently |
| 76 | + effectively a "newtype'd" `NodeId`. |
| 77 | +- `NodeId`, which is an absolute id that identifies a single node in the HIR tree. |
| 78 | + - While these are still in common use, **they are being slowly phased out**. |
| 79 | + - Since they are absolute within the crate, adding a new node |
| 80 | + anywhere in the tree causes the node-ids of all subsequent code in |
| 81 | + the crate to change. This is terrible for incremental compilation, |
| 82 | + as you can perhaps imagine. |
| 83 | + |
| 84 | +### HIR Map |
| 85 | + |
| 86 | +Most of the time when you are working with the HIR, you will do so via |
| 87 | +the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in |
| 88 | +the `hir::map` module). The HIR map contains a number of methods to |
| 89 | +convert between ids of various kinds and to lookup data associated |
| 90 | +with a HIR node. |
| 91 | + |
| 92 | +For example, if you have a `DefId`, and you would like to convert it |
| 93 | +to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This |
| 94 | +returns an `Option<NodeId>` -- this will be `None` if the def-id |
| 95 | +refers to something outside of the current crate (since then it has no |
| 96 | +HIR node), but otherwise returns `Some(n)` where `n` is the node-id of |
| 97 | +the definition. |
| 98 | + |
| 99 | +Similarly, you can use `tcx.hir.find(n)` to lookup the node for a |
| 100 | +`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum |
| 101 | +defined in the map; by matching on this you can find out what sort of |
| 102 | +node the node-id referred to and also get a pointer to the data |
| 103 | +itself. Often, you know what sort of node `n` is -- e.g., if you know |
| 104 | +that `n` must be some HIR expression, you can do |
| 105 | +`tcx.hir.expect_expr(n)`, which will extract and return the |
| 106 | +`&hir::Expr`, panicking if `n` is not in fact an expression. |
| 107 | + |
| 108 | +Finally, you can use the HIR map to find the parents of nodes, via |
| 109 | +calls like `tcx.hir.get_parent_node(n)`. |
| 110 | + |
| 111 | +### HIR Bodies |
| 112 | + |
| 113 | +A **body** represents some kind of executable code, such as the body |
| 114 | +of a function/closure or the definition of a constant. Bodies are |
| 115 | +associated with an **owner**, which is typically some kind of item |
| 116 | +(e.g., a `fn()` or `const`), but could also be a closure expression |
| 117 | +(e.g., `|x, y| x + y`). You can use the HIR map to find the body |
| 118 | +associated with a given def-id (`maybe_body_owned_by()`) or to find |
| 119 | +the owner of a body (`body_owner_def_id()`). |
0 commit comments