-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add emsymbolizer #16095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add emsymbolizer #16095
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
#!/usr/bin/env python3 | ||
|
||
# This is a utility for looking up the symbol names and/or file+line numbers | ||
# of code addresses. There are several possible sources of this information, | ||
# with varying granularity (listed here in approximate preference order). | ||
|
||
# If the wasm has DWARF info, llvm-symbolizer can show the symbol, file, and | ||
# line/column number, potentially including inlining. | ||
# If there is a source map, we can parse it to get file and line number. | ||
# If there is an emscripten symbol map, we can parse that to get the symbol name | ||
# If there is a name section or symbol table, llvm-nm can show the symbol name. | ||
|
||
import os | ||
import sys | ||
from tools import shared | ||
from tools import webassembly | ||
from tools.shared import check_call | ||
|
||
LLVM_SYMBOLIZER = os.path.expanduser( | ||
shared.build_llvm_tool_path(shared.exe_suffix('llvm-symbolizer'))) | ||
|
||
|
||
class Error(BaseException): | ||
pass | ||
|
||
|
||
def get_codesec_offset(module): | ||
for sec in module.sections(): | ||
if sec.type == webassembly.SecType.CODE: | ||
return sec.offset | ||
raise Error(f'No code section found in {module.filename}') | ||
|
||
|
||
def has_debug_line_section(module): | ||
for sec in module.sections(): | ||
if sec.name == ".debug_line": | ||
return True | ||
return False | ||
|
||
|
||
def symbolize_address_dwarf(module, address): | ||
vma_adjust = get_codesec_offset(module) | ||
cmd = [LLVM_SYMBOLIZER, '-e', module.filename, f'--adjust-vma={vma_adjust}', | ||
str(address)] | ||
check_call(cmd) | ||
|
||
|
||
def main(argv): | ||
wasm_file = argv[1] | ||
print('Warning: the command-line and output format of this file are not ' | ||
'finalized yet', file=sys.stderr) | ||
module = webassembly.Module(wasm_file) | ||
|
||
if not has_debug_line_section(module): | ||
raise Error(f"No .debug_line section found in {module.filename}." | ||
" I don't know how to symbolize this file yet") | ||
|
||
symbolize_address_dwarf(module, int(argv[2], 16)) | ||
return 0 | ||
|
||
|
||
if __name__ == '__main__': | ||
try: | ||
rv = main(sys.argv) | ||
except (Error, webassembly.InvalidWasmError, OSError) as e: | ||
print(f'{sys.argv[0]}: {str(e)}', file=sys.stderr) | ||
rv = 1 | ||
sys.exit(rv) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8219,6 +8219,30 @@ def test(infile, source_map_added_dir=''): | |
ensure_dir('inner') | ||
test('inner/a.cpp', 'inner') | ||
|
||
def test_emsymbolizer(self): | ||
# Test DWARF output | ||
self.run_process([EMCC, test_file('core/test_dwarf.c'), | ||
'-g', '-O1', '-o', 'test_dwarf.js']) | ||
|
||
# Use hard-coded addresses. This is potentially brittle, but LLVM's | ||
# O1 output is pretty minimal so hopefully it won't break too much? | ||
# Another option would be to disassemble the binary to look for certain | ||
# instructions or code sequences. | ||
|
||
def get_addr(address): | ||
return self.run_process( | ||
[PYTHON, path_from_root('emsymbolizer.py'), 'test_dwarf.wasm', address], | ||
stdout=PIPE).stdout | ||
|
||
# Check a location in foo(), not inlined. | ||
self.assertIn('test_dwarf.c:6:3', get_addr('0x101')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do a range here perhaps? 0x101 += 10 seems safe and good enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that this test is the other way around... i.e. we enter the address and check the line number. Come to think of it, it actually does have some slack in the positive direction, since the line record covers 5 bytes of instructions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thing that made me worry a little less is that the code looks pretty minimal and not too subject to random changes in e.g. optimizers. Reordering of functions, removal of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, what's this about 5 bytes? Is the line section limited to that resolution? What I was suggesting is something like this, can't it work? for i in range(-10, 10):
if 'test_dwarf.c:6:3' in get_addr(str(0x101 + i)):
break
else:
self.assert('not found') There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are just 5 bytes of instructions that are considered to be part of line 6 (the call, and the drop of the return value). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Disassembling sounds like overkill to me. Sgtm to land this and see if it's an issue in practice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I expect I'll be thinking more about this as I write more tests for the various ways to get line/symbol info. |
||
# Check that both bar (inlined) and main (inlinee) are in the output, | ||
# as described by the DWARF. | ||
# TODO: consider also checking the function names once the output format | ||
# stabilizes more | ||
self.assertRegex(get_addr('0x124').replace('\n', ''), | ||
'test_dwarf.c:15:3.*test_dwarf.c:20:3') | ||
|
||
def test_separate_dwarf(self): | ||
self.run_process([EMCC, test_file('hello_world.c'), '-g']) | ||
self.assertExists('a.out.wasm') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to
tools/create_entry_points
and then run it so that you can avoid calling via python like this?