|
| 1 | +# Source map extensions |
| 2 | + |
| 3 | +Dart2js includes 2 extensions to the source-map format to improve deobfuscation |
| 4 | +of production stack traces. These extensions compensate for some of the |
| 5 | +optimizations that the compiler does which make deobfuscation harder. |
| 6 | + |
| 7 | +## Format changes |
| 8 | + |
| 9 | +Dart2js currently generates source-maps using the [source-map v3][sourcemapv3] |
| 10 | +format. The format allows extensions as new map entries, as long as they are |
| 11 | +prefixed by `x_` (other prefixes are reserved). We use an extension named |
| 12 | +`x_org_dartlang_dart2js`, to store any additional information we need to |
| 13 | +share between dart2js and the deobfuscation tools: |
| 14 | + |
| 15 | +``` |
| 16 | +{ |
| 17 | + version: 3, |
| 18 | + file: “main.dart.js”, |
| 19 | + sources: ["a.dart", "b.dart"], |
| 20 | + names: ["ClassA", "methodFoo"], |
| 21 | + mappings: "AAAA,E;;ABCDE;" |
| 22 | + x_org_dartlang_dart2js: { |
| 23 | + minified_names: {...}, |
| 24 | + frames: [...] |
| 25 | + } |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +We include 2 sections: `minified_names` which encodes the mapping between |
| 30 | +minified and deobfuscated names, and `frames` which encodes relevant |
| 31 | +information about stack frames, including inlining decisions (so that |
| 32 | +deobfuscation tools can expand them later on) and less-relevant frames (so that |
| 33 | +deobfuscation tools can hide them or deemphasize them). |
| 34 | + |
| 35 | +These new sections contain references to names and source URIs, but to |
| 36 | +keep the encoding smaller, we reuse the sources and names tables from the |
| 37 | +main source-map section. |
| 38 | + |
| 39 | +## Minified names data |
| 40 | + |
| 41 | +### Global minified names |
| 42 | + |
| 43 | +Dart2js by default uses a global frequency based namer to choose minified |
| 44 | +names. One of it's invariants is that there is a 1-1 mapping for class names |
| 45 | +and method names (including getter names and setter names). For example, if two |
| 46 | +classes have an instance method with the same public name and same signature of |
| 47 | +optional arguments, they will also have the same minified method name. |
| 48 | + |
| 49 | +To support deobfuscating type names and method names, we embed a translation |
| 50 | +table for minified names, and we will |
| 51 | +add a new mechanism to deobfuscator tools to recognize when these names are |
| 52 | +present. |
| 53 | + |
| 54 | + |
| 55 | +Dart2js divides names in several namespaces. Many namespaces are local and |
| 56 | +but two of them are global to the entire program: `global` and `instance`. The |
| 57 | +`global` namespace includes the names of classes, while the `instance` |
| 58 | +namespace includes the names of instance members. |
| 59 | + |
| 60 | +The format looks like this: |
| 61 | + |
| 62 | +``` |
| 63 | + ... |
| 64 | + x_org_dartlang_dart2js: { |
| 65 | + minified_names: { |
| 66 | + global: { |
| 67 | + "a": 3, // an index in the names table, e.g. "topLevelMethod1" |
| 68 | + "X": 4, // e.g. "MyAbstractClass" |
| 69 | + }, |
| 70 | + instance: { |
| 71 | + "a": 5, // e.g. "instanceMethod1" |
| 72 | + "gb": 6, // e.g. "myGetter" |
| 73 | + } |
| 74 | + } |
| 75 | + } |
| 76 | +``` |
| 77 | + |
| 78 | +Initially our plan is just to include a mapping from one name to another. |
| 79 | +Depending on how much detail we want deobfuscation tools to provide, we could |
| 80 | +one day include the source location where the name is defined (for type names) |
| 81 | +or a list of such locations (for instance methods). |
| 82 | + |
| 83 | +Dart2js also has a global namespace for constants, but we do not believe those |
| 84 | +names appear in error messages, so we don't include it in the source-map |
| 85 | +file at this time. |
| 86 | + |
| 87 | +### Recognizing types and method names in error messages |
| 88 | + |
| 89 | +To help deobfuscator tools identify minified names, dart2js will ensure that |
| 90 | +all string representations of types and method names include a marker to |
| 91 | +indicate what namespace they belong to. |
| 92 | + |
| 93 | +Several string representations already have a marker: |
| 94 | + * The default `instance.toString` (e.g. `new MyClass().toString()`) prints |
| 95 | +`Instance of X`. The prefix "Instance of" is an indication that the name should |
| 96 | +be found in the global namespace. |
| 97 | + * Tear-offs also have an indicator |
| 98 | + |
| 99 | +Some string representations will change in the near future. For example |
| 100 | +`x.runtimeType.toString()` will include a marker in minified-mode. Types can be |
| 101 | +complex, so the marker will be next to every type symbol. For example, |
| 102 | +a function type `ClassA Function(ClassB)` would be printed in minified mode as |
| 103 | +`minified:x Function(minified:y)` instead of `x Function(y)`. |
| 104 | + |
| 105 | +### Local minified names data |
| 106 | + |
| 107 | +Unlike types, constants, and methods; fields, closure local, and local |
| 108 | +variables don't have a 1-1 correspondence. There are various algorithms in use, |
| 109 | +but the bottom-line is that it's possible to have two different field names |
| 110 | +mapped to the same minified name, and similarly different local variable names |
| 111 | +in different methods mapped to the same name. These names are less likely to |
| 112 | +show up in error messages, but when they do, it is often the case that they are |
| 113 | +being used in the same line as the error. |
| 114 | + |
| 115 | +To support deobfuscation of these names, dart2js will include the `sourceNameId` |
| 116 | +on each symbol as it is emitting the regular source-map file. This can be |
| 117 | +encoded in the standard source-map format without any extensions. Today dart2js |
| 118 | +uses the `sourceNameId` to denote the name of the enclosing function instead. |
| 119 | + |
| 120 | +## Inlining data |
| 121 | + |
| 122 | +Dart2js uses method inlining heavily for optimizations. Inlined methods however |
| 123 | +confuse users and deobfuscation tools. For users, there are less frames than |
| 124 | +calls in the program, so they wonder where the missing frames are. For tools, |
| 125 | +the way they find the method name of each frame by looking backwards for a |
| 126 | +function declaration can create a mismatch in the deobfuscated stack trace: the |
| 127 | +deobfuscated frame may show the name of a caller, but the location of an |
| 128 | +inlined method. |
| 129 | + |
| 130 | +The `frames` extension is a table with details about inlining information. |
| 131 | +Each entry in this table consists of: |
| 132 | + * An offset in the program |
| 133 | + * A list of one or more frame entries, which in turn can be: |
| 134 | + * push: indicates that we entered an inlined context |
| 135 | + * pop: indicates that we returned from an inlined context |
| 136 | + * pop-and-empty: indicates that this is a pop that also ends an inlining |
| 137 | + context, hence the offset has no inlining. This is used to mark the end |
| 138 | + of a region containing inlining data |
| 139 | + |
| 140 | +A push operation includes details about the call site, in particular: |
| 141 | + * the source location: offset into the sources URI table, line, and column. |
| 142 | + * the name of the inlined method (as and index in the name table), note that |
| 143 | + dart2js encodes instance methods as a compound name "ClassName.methodName". |
| 144 | + |
| 145 | +Here is an example of what the encoded format would look like: |
| 146 | +``` |
| 147 | +... |
| 148 | + x_org_dartlang_dart2js: { |
| 149 | + ... |
| 150 | + frames: [ |
| 151 | + [ 2310, // offset containing data |
| 152 | + [2, 34, 11, 4]], // a list encodes a push operation |
| 153 | + [ 2320, [4, 4, 2, 9]], |
| 154 | + [ 2330, -1], // -1 encodes a pop operation |
| 155 | + [ 2333, 0] // 0 encodes a pop-and-empty operation |
| 156 | + ] |
| 157 | + } |
| 158 | +``` |
| 159 | + |
| 160 | +A few details worth noting about the format: |
| 161 | + * Multiple operations are allowed in case multiple methods are inlined or |
| 162 | + return at once. In that case, the second inlining information will have the |
| 163 | + source-location where the first inlined method invokes the second inlined |
| 164 | + method. |
| 165 | + |
| 166 | + For example, `[110, [2, 11, 3, 200], [3, 10, 4, 19]]` represents 2 pushes at |
| 167 | + offset `110`: the current method calls method `200` (index in the name table) |
| 168 | + at location `2, 11, 3` (2 is an index in the URI map, 11 is line, 3 the |
| 169 | + column) which then calls method `19` at location `3, 10, 4`. |
| 170 | + |
| 171 | + * The encoding excludes the name of the caller because it can be derived from |
| 172 | + the existing context (either from source-map information of the enclosing |
| 173 | + function, or from the previous inlining push calls). |
| 174 | + |
| 175 | + We also considered to store the name of the caller and omit the callee, but |
| 176 | + decided against it. That would've worked today because we don't use |
| 177 | + source-names to support deobfuscation of fields and local names, instead we |
| 178 | + are storing the name of the method. As we improve deobfuscation of minified |
| 179 | + names, the name of the inlined method will no longer be available in the main |
| 180 | + source-map section, so we need to include the name of the callee here. |
| 181 | + |
| 182 | +This encoding helps deobfuscation tools decode the full stack trace with a |
| 183 | +simple backwards traversal of the table: |
| 184 | + |
| 185 | + * Based on the offset of a frame, a binary search is done to find the first |
| 186 | + entry before the frame location. |
| 187 | + |
| 188 | + * Then frames are visited backwards, tracking the current inlining level and |
| 189 | + counting pop and push operations. Once an "pop-and-empty" operation is |
| 190 | + found, the search stops. |
| 191 | + |
| 192 | +Note that this encoding is also sparse and only requires us to add information |
| 193 | +for methods containing inlining. That is because the empty markers basically |
| 194 | +indicate that every method between a given offset and the empty marker had no |
| 195 | +inlining in it. |
| 196 | + |
| 197 | +[sourcemapv3]: https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.n05z8dfyl3yh |
0 commit comments