Skip to content

Add emsymbolizer #16095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions emsymbolizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env python3

# This is a utility for looking up the symbol names and/or file+line numbers
# of code addresses. There are several possible sources of this information,
# with varying granularity (listed here in approximate preference order).

# If the wasm has DWARF info, llvm-symbolizer can show the symbol, file, and
# line/column number, potentially including inlining.
# If there is a source map, we can parse it to get file and line number.
# If there is an emscripten symbol map, we can parse that to get the symbol name
# If there is a name section or symbol table, llvm-nm can show the symbol name.

import os
import sys
from tools import shared
from tools import webassembly
from tools.shared import check_call

LLVM_SYMBOLIZER = os.path.expanduser(
shared.build_llvm_tool_path(shared.exe_suffix('llvm-symbolizer')))


class Error(BaseException):
pass


def get_codesec_offset(module):
for sec in module.sections():
if sec.type == webassembly.SecType.CODE:
return sec.offset
raise Error(f'No code section found in {module.filename}')


def has_debug_line_section(module):
for sec in module.sections():
if sec.name == ".debug_line":
return True
return False


def symbolize_address_dwarf(module, address):
vma_adjust = get_codesec_offset(module)
cmd = [LLVM_SYMBOLIZER, '-e', module.filename, f'--adjust-vma={vma_adjust}',
str(address)]
check_call(cmd)


def main(argv):
wasm_file = argv[1]
print('Warning: the command-line and output format of this file are not '
'finalized yet', file=sys.stderr)
module = webassembly.Module(wasm_file)

if not has_debug_line_section(module):
raise Error(f"No .debug_line section found in {module.filename}."
" I don't know how to symbolize this file yet")

symbolize_address_dwarf(module, int(argv[2], 16))
return 0


if __name__ == '__main__':
try:
rv = main(sys.argv)
except (Error, webassembly.InvalidWasmError, OSError) as e:
print(f'{sys.argv[0]}: {str(e)}', file=sys.stderr)
rv = 1
sys.exit(rv)
8 changes: 7 additions & 1 deletion tests/core/test_dwarf.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@

EM_JS(int, out_to_js, (int x), {})

void foo() {
void __attribute__((noinline)) foo() {
out_to_js(0); // line 5
out_to_js(1); // line 6
out_to_js(2); // line 7
// A silly possible recursion to avoid binaryen doing any inlining.
if (out_to_js(3)) foo();
}

void __attribute__((always_inline)) bar() {
out_to_js(3);
__builtin_trap();
}

int main() {
foo();
bar();
}
24 changes: 24 additions & 0 deletions tests/test_other.py
Original file line number Diff line number Diff line change
Expand Up @@ -8219,6 +8219,30 @@ def test(infile, source_map_added_dir=''):
ensure_dir('inner')
test('inner/a.cpp', 'inner')

def test_emsymbolizer(self):
# Test DWARF output
self.run_process([EMCC, test_file('core/test_dwarf.c'),
'-g', '-O1', '-o', 'test_dwarf.js'])

# Use hard-coded addresses. This is potentially brittle, but LLVM's
# O1 output is pretty minimal so hopefully it won't break too much?
# Another option would be to disassemble the binary to look for certain
# instructions or code sequences.

def get_addr(address):
return self.run_process(
[PYTHON, path_from_root('emsymbolizer.py'), 'test_dwarf.wasm', address],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this to tools/create_entry_points and then run it so that you can avoid calling via python like this?

stdout=PIPE).stdout

# Check a location in foo(), not inlined.
self.assertIn('test_dwarf.c:6:3', get_addr('0x101'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a range here perhaps? 0x101 += 10 seems safe and good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this test is the other way around... i.e. we enter the address and check the line number. Come to think of it, it actually does have some slack in the positive direction, since the line record covers 5 bytes of instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that made me worry a little less is that the code looks pretty minimal and not too subject to random changes in e.g. optimizers. Reordering of functions, removal of __wasm_call_ctors, changes in imports/exports etc could still break it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, what's this about 5 bytes? Is the line section limited to that resolution?

What I was suggesting is something like this, can't it work?

for i in range(-10, 10):
  if 'test_dwarf.c:6:3' in get_addr(str(0x101 + i)):
    break
else:
  self.assert('not found')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just 5 bytes of instructions that are considered to be part of line 6 (the call, and the drop of the return value).
I guess that suggestion could work. Although I feel like that would cover small perturbations in the code generation, but probably not changes like the ones I mentioned above (well, I guess it would survive removing or moving __wasm_call_ctors). Another option would be to disassemble the binary and find particular instructions, which would be precise but is potentially almost as error-prone as the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disassembling sounds like overkill to me. Sgtm to land this and see if it's an issue in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I expect I'll be thinking more about this as I write more tests for the various ways to get line/symbol info.

# Check that both bar (inlined) and main (inlinee) are in the output,
# as described by the DWARF.
# TODO: consider also checking the function names once the output format
# stabilizes more
self.assertRegex(get_addr('0x124').replace('\n', ''),
'test_dwarf.c:15:3.*test_dwarf.c:20:3')

def test_separate_dwarf(self):
self.run_process([EMCC, test_file('hello_world.c'), '-g'])
self.assertExists('a.out.wasm')
Expand Down
13 changes: 10 additions & 3 deletions tools/webassembly.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,10 @@ class DylinkType(IntEnum):
IMPORT_INFO = 4


class InvalidWasmError(BaseException):
pass


Section = namedtuple('Section', ['type', 'size', 'offset', 'name'])
Limits = namedtuple('Limits', ['flags', 'initial', 'maximum'])
Import = namedtuple('Import', ['kind', 'module', 'field'])
Expand All @@ -123,15 +127,18 @@ class Module:
"""Extremely minimal wasm module reader. Currently only used
for parsing the dylink section."""
def __init__(self, filename):
self.buf = None # Set this before FS calls below in case they throw.
self.filename = filename
self.size = os.path.getsize(filename)
self.buf = open(filename, 'rb')
magic = self.buf.read(4)
version = self.buf.read(4)
assert magic == MAGIC
assert version == VERSION
if magic != MAGIC or version != VERSION:
raise InvalidWasmError(f'{filename} is not a valid wasm file')

def __del__(self):
self.buf.close()
if self.buf:
self.buf.close()

def readAt(self, offset, count):
self.buf.seek(offset)
Expand Down