Skip to content

Add emsymbolizer #16095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 27, 2022
Merged

Add emsymbolizer #16095

merged 5 commits into from
Jan 27, 2022

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Jan 22, 2022

Emsymbolizer is a tool for symbolizing a binary, i.e. showing the file/line or symbol info for a code address.
As described in #16094 there are several ways to do this with emscripten.
The first PR is for item 1, using llvm-symbolizer with DWARF.

This is a WIP for #16094
The first PR will be for item 1, using llvm-symbolizer with DWARF.
@dschuff dschuff marked this pull request as ready for review January 26, 2022 22:43
@dschuff dschuff requested review from sbc100 and kripken January 26, 2022 22:43
@dschuff
Copy link
Member Author

dschuff commented Jan 26, 2022

OK, PTAL. On a scale of 🤔 to 🤮, what do you think of the test?

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm. For the test, yeah, that seems risky to be brittle... maybe we can add slack, see comment.

stdout=PIPE).stdout

# Check a location in foo(), not inlined.
self.assertIn('test_dwarf.c:6:3', get_addr('0x101'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a range here perhaps? 0x101 += 10 seems safe and good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this test is the other way around... i.e. we enter the address and check the line number. Come to think of it, it actually does have some slack in the positive direction, since the line record covers 5 bytes of instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that made me worry a little less is that the code looks pretty minimal and not too subject to random changes in e.g. optimizers. Reordering of functions, removal of __wasm_call_ctors, changes in imports/exports etc could still break it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, what's this about 5 bytes? Is the line section limited to that resolution?

What I was suggesting is something like this, can't it work?

for i in range(-10, 10):
  if 'test_dwarf.c:6:3' in get_addr(str(0x101 + i)):
    break
else:
  self.assert('not found')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just 5 bytes of instructions that are considered to be part of line 6 (the call, and the drop of the return value).
I guess that suggestion could work. Although I feel like that would cover small perturbations in the code generation, but probably not changes like the ones I mentioned above (well, I guess it would survive removing or moving __wasm_call_ctors). Another option would be to disassemble the binary and find particular instructions, which would be precise but is potentially almost as error-prone as the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disassembling sounds like overkill to me. Sgtm to land this and see if it's an issue in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I expect I'll be thinking more about this as I write more tests for the various ways to get line/symbol info.

@dschuff dschuff enabled auto-merge (squash) January 27, 2022 00:36
@dschuff
Copy link
Member Author

dschuff commented Jan 27, 2022

Weird, all the tests passed but the status doesn't seem to have propagated to GH. I'm just going to merge manually.

@dschuff dschuff disabled auto-merge January 27, 2022 00:54
@dschuff dschuff merged commit db77c76 into main Jan 27, 2022
@dschuff dschuff deleted the emsymbolize branch January 27, 2022 00:54

def get_addr(address):
return self.run_process(
[PYTHON, path_from_root('emsymbolizer.py'), 'test_dwarf.wasm', address],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this to tools/create_entry_points and then run it so that you can avoid calling via python like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants