-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add emsymbolizer #16095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add emsymbolizer #16095
Conversation
This is a WIP for #16094 The first PR will be for item 1, using llvm-symbolizer with DWARF.
OK, PTAL. On a scale of 🤔 to 🤮, what do you think of the test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code lgtm. For the test, yeah, that seems risky to be brittle... maybe we can add slack, see comment.
stdout=PIPE).stdout | ||
|
||
# Check a location in foo(), not inlined. | ||
self.assertIn('test_dwarf.c:6:3', get_addr('0x101')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do a range here perhaps? 0x101 += 10 seems safe and good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that this test is the other way around... i.e. we enter the address and check the line number. Come to think of it, it actually does have some slack in the positive direction, since the line record covers 5 bytes of instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that made me worry a little less is that the code looks pretty minimal and not too subject to random changes in e.g. optimizers. Reordering of functions, removal of __wasm_call_ctors
, changes in imports/exports etc could still break it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, what's this about 5 bytes? Is the line section limited to that resolution?
What I was suggesting is something like this, can't it work?
for i in range(-10, 10):
if 'test_dwarf.c:6:3' in get_addr(str(0x101 + i)):
break
else:
self.assert('not found')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are just 5 bytes of instructions that are considered to be part of line 6 (the call, and the drop of the return value).
I guess that suggestion could work. Although I feel like that would cover small perturbations in the code generation, but probably not changes like the ones I mentioned above (well, I guess it would survive removing or moving __wasm_call_ctors). Another option would be to disassemble the binary and find particular instructions, which would be precise but is potentially almost as error-prone as the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disassembling sounds like overkill to me. Sgtm to land this and see if it's an issue in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I expect I'll be thinking more about this as I write more tests for the various ways to get line/symbol info.
Weird, all the tests passed but the status doesn't seem to have propagated to GH. I'm just going to merge manually. |
|
||
def get_addr(address): | ||
return self.run_process( | ||
[PYTHON, path_from_root('emsymbolizer.py'), 'test_dwarf.wasm', address], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to tools/create_entry_points
and then run it so that you can avoid calling via python like this?
Emsymbolizer is a tool for symbolizing a binary, i.e. showing the file/line or symbol info for a code address.
As described in #16094 there are several ways to do this with emscripten.
The first PR is for item 1, using llvm-symbolizer with DWARF.