Skip to content

Source maps (Refs #1252, Reopens #1267) #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 26, 2013

Conversation

int3
Copy link
Contributor

@int3 int3 commented Jun 10, 2013

em++ -g tests/exceptions/typed.cpp -o typed.html should give an effective demonstration of source maps at work.

I'm not entirely sure how to fit a test for the HTML generation portion into the test suite. However, the code for that part is fairly simple, so a test may not be absolutely necessary...

@kripken
Copy link
Member

kripken commented Jun 10, 2013

Let's add an html file in the test directory. We can't do automatic testing of source maps actually working, but it would hugely help manual bisecting in the future, if all we need to do to verify things manually are to open a web page.

@int3
Copy link
Contributor Author

int3 commented Jun 10, 2013

Test added.

@kripken
Copy link
Member

kripken commented Jun 17, 2013

Are you still working on this branch? (The last commit looks like a wip.)

@int3
Copy link
Contributor Author

int3 commented Jun 17, 2013

Oops, the last two commits were not meant to be pushed. I've unpushed them now... could you have a look to see if this branch now works for you (with Chrome 28)? It'd be nice to have this merged in, and then we can handle optimized source maps later on.

(Should work in Nightly too as long as the path passed to emcc isn't an absolute path; that should work in a few days though, since fitzgen has already pushed a fix to devtools.)

@kripken
Copy link
Member

kripken commented Jun 18, 2013

The test doesn't pass for me,

$ ./tests/runner.py test_source_map
Running Emscripten tests...
test_source_map (__main__.default) ... (checking sanity from test runner)
INFO     root: (Emscripten: Running sanity checks)
Unable to find original file for /tmp/emscripten_test_default_GR0OER/src.cpp at /tmp/emscripten_test_default_GR0OER/tmp/emscripten_test_default_GR0OER/src.cpp

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
TypeError: Object #<Object> has no method 'appendFileSync'
    at generateMap (/home/alon/Dev/emscripten/tools/sourcemapper.js:105:8)
    at Object.<anonymous> (/home/alon/Dev/emscripten/tools/sourcemapper.js:119:5)
    at Module._compile (module.js:441:26)
    at Object..js (module.js:459:10)
    at Module.load (module.js:348:32)
    at Function._load (module.js:308:12)
    at Array.0 (module.js:479:10)
    at EventEmitter._tickCallback (node.js:192:41)
ERROR

======================================================================
ERROR: test_source_map (__main__.default)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/runner.py", line 9541, in test_source_map
    self.build(src, dirname, src_filename, post_build=(None,post))
  File "./tests/runner.py", line 253, in build
    self.ll_to_js(filename, extra_emscripten_args, post_build)
  File "./tests/runner.py", line 196, in ll_to_js
    if post2: post2(filename + '.o.js')
  File "./tests/runner.py", line 9530, in post
    self.assertIdentical(src, data['sourcesContent'][0])
KeyError: 'sourcesContent'

----------------------------------------------------------------------
Ran 1 test in 0.783s

FAILED (errors=1)

@kripken
Copy link
Member

kripken commented Jun 18, 2013

Ok, I updated node, now I see

$ ./tests/runner.py test_source_map
Running Emscripten tests...
test_source_map (__main__.default) ... (checking sanity from test runner)
INFO     root: (Emscripten: Running sanity checks)
Unable to find original file for /tmp/emscripten_test_default_KG97MX/src.cpp at /tmp/emscripten_test_default_KG97MX/tmp/emscripten_test_default_KG97MX/src.cpp
ERROR

======================================================================
ERROR: test_source_map (__main__.default)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/runner.py", line 9541, in test_source_map
    self.build(src, dirname, src_filename, post_build=(None,post))
  File "./tests/runner.py", line 253, in build
    self.ll_to_js(filename, extra_emscripten_args, post_build)
  File "./tests/runner.py", line 196, in ll_to_js
    if post2: post2(filename + '.o.js')
  File "./tests/runner.py", line 9530, in post
    self.assertIdentical(src, data['sourcesContent'][0])
KeyError: 'sourcesContent'

----------------------------------------------------------------------
Ran 1 test in 0.727s

FAILED (errors=1)
alon@mozilla-XPS-8300:~/Dev/emscripten$ 

@int3
Copy link
Contributor Author

int3 commented Jun 19, 2013

I've rebased the entire branch on incoming, and I've added support for source maps for optimized builds. Basically I hacked the uglify code generator to emit // @line 123 comments (the parser already carried this information.) I only did it for assignment and var declaration statements thus far; I should probably add stuff in for control flow condition checks, but that might be trickier (perhaps block comments instead of line comments need to be used). Nonetheless, the current implementation should be able to map a large fraction of lines, and will be accurate under AST transforms.

(Also, as discussed offline, the error messages in the above two comments were caused by my usage of newer Node features. We've now upgraded the minimum required version to v0.8, so that should be fine now.)

Edit: Hunting down some possible line number inaccuracies at the moment.

@int3
Copy link
Contributor Author

int3 commented Jun 20, 2013

Update on current status of the 'inaccuracies': It turns out that we don't parse the LLVM line numbers correctly if the instruction spans more than two lines. In particular,

invoke void @__cxa_throw(i8* %exception, i8* bitcast ({ i8*, i8*, i32, i8* }* @_ZTIP4Test to i8*), i8* null) noreturn
        to label %unreachable unwind label %lpad3, !dbg !30 ; [debug line = 12:0]

__cxa_throw ends up being unmapped in the final source. In some cases this works out fine because __cxa_allocate usually appears as the previous instruction, and it usually shares the same line number in the original source, and browsers match unmapped lines with the closest previously mapped line. However code can be inlined between __cxa_allocate and __cxa_throw, and this results in rather wrong results.

Assuming that line number metadata is always added to the end of the instruction, we might be able to fix this simply by assigning every LLVM instruction a line number based on the line where it ends, rather than the line where it begins.

@kripken
Copy link
Member

kripken commented Jun 20, 2013

Interesting, but I am not too worried about slightly incorrect llvm debug info - optimizations like inlining often make debugging trickier anyhow ;)

I am more concerned with our making sure that there are no regressions on the test suite with such a change to uglify, and also no slowdowns as well, since this is a bug chunk of our compilation time. We should measure a few of the larger tests like poppler before and after this change.

@kripken
Copy link
Member

kripken commented Jun 22, 2013

(i see you just added more code, please comment in the bug here to ping me when this is ready for review)

int3 added 6 commits June 22, 2013 01:23
Line numbering of exceptions now work (better) in optimized mode.
NodeWithToken is a string-like type produced by the parser during 'embed
tokens' mode, which allows us to track line numbers.
Makes for an easier-to-review pull request.
* Now works on FF Nightly
* Test does not run unless explicitly requested
@kripken
Copy link
Member

kripken commented Jun 25, 2013

So to be sure I follow, does the isArray change get rid of half the previous slowdown? And the rest is == vs ===, which we still suffer from, but it was countered by optimizing other uses of ==/===?

@int3
Copy link
Contributor Author

int3 commented Jun 25, 2013

Yeah, the isArray change in calculating relevant was responsible for about half of the slowdown. That was the only single-line change that I profiled on its own. The rest of the speedups were obtained after changing == to === and using isArray to determine whether the result returned by pre was a new node.

@kripken
Copy link
Member

kripken commented Jun 25, 2013

Ok, then I am not sure what the change is that you said you prefer to not do. Can you elaborate a little more on what it would look like, and what speedup we could expect from it?

@int3
Copy link
Contributor Author

int3 commented Jun 25, 2013

Right now in non-debug mode our tokens are of the form

ast = ['num', 1]

In debug mode they get replaced by

ast = [NodeWithToken('num', {start: {line: 10, column: 20}, end: {line: ..., column: ...}}), 1]

So we have to compare node[0] == 'num' instead of node[0] === 'num' because the former invokes toString on the operands.

If instead we write a pass that does something like

ast.start = ast[0].start
ast.end = ast[0].end
ast[0] = ast[0].toString()

Then we would be able to use strict comparison on ast[0].

All that said, I'm not entirely confident that we would see a significant increase. Yesterday I converted the other equals operators in js-optimizer.js to strict comparison where possible, and didn't see a measurable difference. I suspect my addition of isArray at those two points in traverse had the most impact on performance, because they reduced branching in the most-invoked function.

@kripken
Copy link
Member

kripken commented Jun 25, 2013

What is performance like when you revert the isArray optimization? That should tell us how much the rest is.

@int3
Copy link
Contributor Author

int3 commented Jun 26, 2013

I did one better and replaced all the == with === for the type comparisons. (Even without the extra hacky pass I described above, this is perfectly fine as long as we're not trying to make source maps.) Compilation for sqlite went from 16.78 to 13.68 (!). That's about an 18% reduction. (I did a diff of the output to make sure we weren't accidentally missing out an entire pass or something...) Well, it does seem worth it to write that extra pass.

@kripken
Copy link
Member

kripken commented Jun 26, 2013

Thanks! Yeah, I really don't think we can afford to give up an 18% speedup. And the extra pass would only be done on debug builds?

@int3
Copy link
Contributor Author

int3 commented Jun 26, 2013

Yup, the extra pass would only be needed if source maps are being produced.

This allows us to use strict comparisons on the node type.
@int3
Copy link
Contributor Author

int3 commented Jun 26, 2013

Turns out it wasn't so difficult after all. I'd just overestimated the difficulty after a week debugging all those edge cases. Anyway, instead of writing an extra pass I modified the parser slightly to add the line numbers to the nodes themselves. Running the tests now; will let you know their results tomorrow morning.

Edit: asm2g passed.

@kripken
Copy link
Member

kripken commented Jun 26, 2013

Great! What is performance like now?

@int3
Copy link
Contributor Author

int3 commented Jun 26, 2013

I got about the same performance on sqlite that I mentioned previously -- ~13.68s. I haven't re-measured poppler yet.

@kripken
Copy link
Member

kripken commented Jun 26, 2013

Ok, great. I'll do some testing too.

@kripken
Copy link
Member

kripken commented Jun 26, 2013

There is a small conflict now in this branch, I fixed a js optimizer crash yesterday. trivial though

@kripken
Copy link
Member

kripken commented Jun 26, 2013

I'm running tests on this now, no need to rebase it for that conflict.

@@ -2031,6 +2035,7 @@ function eliminate(ast, memSafe) {
// examine body and note locals
var hasSwitch = false;
traverse(func, function(node, type) {
if (debug && type) type = type.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed here and nowhere else?

@kripken
Copy link
Member

kripken commented Jun 26, 2013

I am about to merge this. But as followups, I had one question above, and also: I still see two isArray checks, are they still needed/helpful?

@kripken kripken merged commit 5383aa8 into emscripten-core:incoming Jun 26, 2013
@kripken
Copy link
Member

kripken commented Jun 26, 2013

Merged, thanks! Let's not forget the two followups though.

One other thing i noticed was not addressed was changing --debug to be consistent with the current format. Was there a reason to not do that, or just forgotten?

@int3
Copy link
Contributor Author

int3 commented Jun 26, 2013

I profiled the isArray stuff and it looks like using the typeof checks (node !== null && typeof node === 'object') is a 3% win on sqlite -- 15.27 versus 14.84. Using isArray is more semantically accurate though, the typeof check takes a short cut and only verifies that we have an object, because all objects we receive are arrays.

The if (debug …) checks were leftover cruft.

The reason for the --debug flag was addressed in this comment. Let me know if you would still like me to change it.

Thanks for reviewing and merging!

@kripken
Copy link
Member

kripken commented Jul 8, 2013

Source mapping mentions "unable to find original file" on some things in the test suite, like ./tests/runner.py test_zlib - any idea why?

@int3
Copy link
Contributor Author

int3 commented Jul 8, 2013

I think what is happening is that the static libraries are getting built in some directory, then the resulting bitcode files are getting copied over into a different directory before being linked. The sourcemapper script tries to look for the original source files at link time and thus cannot find them.

One possible solution is to always write out absolute paths in the // @line comments. Nevermind, this wouldn't work if the bitcode files are the ones being copied...

@int3 int3 deleted the source-maps branch July 8, 2013 21:19
@int3
Copy link
Contributor Author

int3 commented Jul 8, 2013

Yeah, I don't think there's a great way to fix this. Ideally we'd want clang to output absolute paths in its LLVM debug info, but the only time it does that is if we pass in an absolute source path when invoking it. We could have a post-bitcode-compilation pass that rewrote the bitcode output to use absolute paths, but that seems like too much work.

It looks like we're generating source maps when testing default because -O0 sets debug_level to 4. How about making -O0 correspond to -g3 instead? This would turn off the annoying error messages and make our tests run more quickly (because they can use more than 1 core.)

@kripken
Copy link
Member

kripken commented Jul 15, 2013

Ok, got it.

Makes sense about -O0, but we do want for emcc -g src.cpp to emit full debug info. Not sure what the best compromise here is, one core is indeed not great in the test suite.

@gagern gagern mentioned this pull request Jun 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants