Skip to content

Commit 918cda1

Browse files
sigmundchcommit-bot@chromium.org
authored andcommitted
Add docs about sourcemap extensions
Change-Id: Ic785e6e73a04be8d026e04766c8c505abde6a84a Reviewed-on: https://dart-review.googlesource.com/67687 Reviewed-by: Stephen Adams <[email protected]> Commit-Queue: Stephen Adams <[email protected]>
1 parent 329e029 commit 918cda1

File tree

1 file changed

+197
-0
lines changed

1 file changed

+197
-0
lines changed
+197
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Source map extensions
2+
3+
Dart2js includes 2 extensions to the source-map format to improve deobfuscation
4+
of production stack traces. These extensions compensate for some of the
5+
optimizations that the compiler does which make deobfuscation harder.
6+
7+
## Format changes
8+
9+
Dart2js currently generates source-maps using the [source-map v3][sourcemapv3]
10+
format. The format allows extensions as new map entries, as long as they are
11+
prefixed by `x_` (other prefixes are reserved). We use an extension named
12+
`x_org_dartlang_dart2js`, to store any additional information we need to
13+
share between dart2js and the deobfuscation tools:
14+
15+
```
16+
{
17+
 version: 3,
18+
 file: “main.dart.js”,
19+
 sources: ["a.dart", "b.dart"],
20+
 names: ["ClassA", "methodFoo"],
21+
 mappings: "AAAA,E;;ABCDE;"
22+
 x_org_dartlang_dart2js: {
23+
   minified_names: {...},
24+
   frames: [...]
25+
 }
26+
}
27+
```
28+
29+
We include 2 sections: `minified_names` which encodes the mapping between
30+
minified and deobfuscated names, and `frames` which encodes relevant
31+
information about stack frames, including inlining decisions (so that
32+
deobfuscation tools can expand them later on) and less-relevant frames (so that
33+
deobfuscation tools can hide them or deemphasize them).
34+
35+
These new sections contain references to names and source URIs, but to
36+
keep the encoding smaller, we reuse the sources and names tables from the
37+
main source-map section.
38+
39+
## Minified names data
40+
41+
### Global minified names
42+
43+
Dart2js by default uses a global frequency based namer to choose minified
44+
names. One of it's invariants is that there is a 1-1 mapping for class names
45+
and method names (including getter names and setter names). For example, if two
46+
classes have an instance method with the same public name and same signature of
47+
optional arguments, they will also have the same minified method name.
48+
49+
To support deobfuscating type names and method names, we embed a translation
50+
table for minified names, and we will
51+
add a new mechanism to deobfuscator tools to recognize when these names are
52+
present.
53+
54+
55+
Dart2js divides names in several namespaces. Many namespaces are local and
56+
but two of them are global to the entire program: `global` and `instance`. The
57+
`global` namespace includes the names of classes, while the `instance`
58+
namespace includes the names of instance members.
59+
60+
The format looks like this:
61+
62+
```
63+
...
64+
x_org_dartlang_dart2js: {
65+
minified_names: {
66+
global: {
67+
"a": 3, // an index in the names table, e.g. "topLevelMethod1"
68+
"X": 4, // e.g. "MyAbstractClass"
69+
},
70+
instance: {
71+
"a": 5, // e.g. "instanceMethod1"
72+
"gb": 6, // e.g. "myGetter"
73+
}
74+
}
75+
}
76+
```
77+
78+
Initially our plan is just to include a mapping from one name to another.
79+
Depending on how much detail we want deobfuscation tools to provide, we could
80+
one day include the source location where the name is defined (for type names)
81+
or a list of such locations (for instance methods).
82+
83+
Dart2js also has a global namespace for constants, but we do not believe those
84+
names appear in error messages, so we don't include it in the source-map
85+
file at this time.
86+
87+
### Recognizing types and method names in error messages
88+
89+
To help deobfuscator tools identify minified names, dart2js will ensure that
90+
all string representations of types and method names include a marker to
91+
indicate what namespace they belong to.
92+
93+
Several string representations already have a marker:
94+
* The default `instance.toString` (e.g. `new MyClass().toString()`) prints
95+
`Instance of X`. The prefix "Instance of" is an indication that the name should
96+
be found in the global namespace.
97+
* Tear-offs also have an indicator
98+
99+
Some string representations will change in the near future. For example
100+
`x.runtimeType.toString()` will include a marker in minified-mode. Types can be
101+
complex, so the marker will be next to every type symbol. For example,
102+
a function type `ClassA Function(ClassB)` would be printed in minified mode as
103+
`minified:x Function(minified:y)` instead of `x Function(y)`.
104+
105+
### Local minified names data
106+
107+
Unlike types, constants, and methods; fields, closure local, and local
108+
variables don't have a 1-1 correspondence. There are various algorithms in use,
109+
but the bottom-line is that it's possible to have two different field names
110+
mapped to the same minified name, and similarly different local variable names
111+
in different methods mapped to the same name. These names are less likely to
112+
show up in error messages, but when they do, it is often the case that they are
113+
being used in the same line as the error.
114+
115+
To support deobfuscation of these names, dart2js will include the `sourceNameId`
116+
on each symbol as it is emitting the regular source-map file. This can be
117+
encoded in the standard source-map format without any extensions. Today dart2js
118+
uses the `sourceNameId` to denote the name of the enclosing function instead.
119+
120+
## Inlining data
121+
122+
Dart2js uses method inlining heavily for optimizations. Inlined methods however
123+
confuse users and deobfuscation tools. For users, there are less frames than
124+
calls in the program, so they wonder where the missing frames are. For tools,
125+
the way they find the method name of each frame by looking backwards for a
126+
function declaration can create a mismatch in the deobfuscated stack trace: the
127+
deobfuscated frame may show the name of a caller, but the location of an
128+
inlined method.
129+
130+
The `frames` extension is a table with details about inlining information.
131+
Each entry in this table consists of:
132+
* An offset in the program
133+
* A list of one or more frame entries, which in turn can be:
134+
* push: indicates that we entered an inlined context
135+
* pop: indicates that we returned from an inlined context
136+
* pop-and-empty: indicates that this is a pop that also ends an inlining
137+
context, hence the offset has no inlining. This is used to mark the end
138+
of a region containing inlining data
139+
140+
A push operation includes details about the call site, in particular:
141+
* the source location: offset into the sources URI table, line, and column.
142+
* the name of the inlined method (as and index in the name table), note that
143+
dart2js encodes instance methods as a compound name "ClassName.methodName".
144+
145+
Here is an example of what the encoded format would look like:
146+
```
147+
...
148+
x_org_dartlang_dart2js: {
149+
...
150+
frames: [
151+
[ 2310, // offset containing data
152+
[2, 34, 11, 4]], // a list encodes a push operation
153+
[ 2320, [4, 4, 2, 9]],
154+
[ 2330, -1], // -1 encodes a pop operation
155+
[ 2333, 0] // 0 encodes a pop-and-empty operation
156+
]
157+
}
158+
```
159+
160+
A few details worth noting about the format:
161+
* Multiple operations are allowed in case multiple methods are inlined or
162+
return at once. In that case, the second inlining information will have the
163+
source-location where the first inlined method invokes the second inlined
164+
method.
165+
166+
For example, `[110, [2, 11, 3, 200], [3, 10, 4, 19]]` represents 2 pushes at
167+
offset `110`: the current method calls method `200` (index in the name table)
168+
at location `2, 11, 3` (2 is an index in the URI map, 11 is line, 3 the
169+
column) which then calls method `19` at location `3, 10, 4`.
170+
171+
* The encoding excludes the name of the caller because it can be derived from
172+
the existing context (either from source-map information of the enclosing
173+
function, or from the previous inlining push calls).
174+
175+
We also considered to store the name of the caller and omit the callee, but
176+
decided against it. That would've worked today because we don't use
177+
source-names to support deobfuscation of fields and local names, instead we
178+
are storing the name of the method. As we improve deobfuscation of minified
179+
names, the name of the inlined method will no longer be available in the main
180+
source-map section, so we need to include the name of the callee here.
181+
182+
This encoding helps deobfuscation tools decode the full stack trace with a
183+
simple backwards traversal of the table:
184+
185+
* Based on the offset of a frame, a binary search is done to find the first
186+
entry before the frame location.
187+
188+
* Then frames are visited backwards, tracking the current inlining level and
189+
counting pop and push operations. Once an "pop-and-empty" operation is
190+
found, the search stops.
191+
192+
Note that this encoding is also sparse and only requires us to add information
193+
for methods containing inlining. That is because the empty markers basically
194+
indicate that every method between a given offset and the empty marker had no
195+
inlining in it.
196+
197+
[sourcemapv3]: https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.n05z8dfyl3yh

0 commit comments

Comments
 (0)