Skip to content

Wrapper script for symbolizing addresses #16094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dschuff opened this issue Jan 22, 2022 · 4 comments
Closed

Wrapper script for symbolizing addresses #16094

dschuff opened this issue Jan 22, 2022 · 4 comments

Comments

@dschuff
Copy link
Member

dschuff commented Jan 22, 2022

There are lots of use cases for getting the line number and/or symbol name from a code address, and there are several existing ways it can be done:

  1. Given a wasm (or object) file with a DWARF .debug_line section, llvm-symbolizer can get symbol and file/line information
  2. Given a wasm file and a source map, we can get file/line information from the source map (I don't think we currently have a sourcemap parser that goes in this direction, we'd want to add one)
  3. Given a wasm file with a name section, we can get symbol information for code addresses (but not line information, and not for data addresses)
  4. Given a wasm file and an emscripten symbol map, we can get symbol information for code addresses (but not file/line information, and not for data addreses)
  5. (IIRC) Given an object file (or a wasm file with a symbol table) llvm-nm can get symbol information (but not file/line information)

Given that 3 is wasm-specific, and 2 and 4 are emscripten-specific, it might be warranted to have an emscripten-specific wrapper that can just figure out which, if any, of these sources of information are available, and print the information.

Some of these (at least the LLVM-based ones) currently require section offsets rather than the file offsets printed by the engines in stack traces and the like. That's an orthogonal problem to this one (e.g. we might want to fix the llvm tools and/or internal interfaces to use file offsets instead). But either way this script can do any necessary conversions, and we can adjust it if we make changes to LLVM, and it will have utility even aside from that.

dschuff added a commit that referenced this issue Jan 22, 2022
This is a WIP for #16094
The first PR will be for item 1, using llvm-symbolizer with DWARF.
dschuff added a commit that referenced this issue Jan 26, 2022
This is a WIP for #16094
The first PR will be for item 1, using llvm-symbolizer with DWARF.
dschuff added a commit that referenced this issue Jan 27, 2022
Emsymbolizer is a tool for symbolizing a binary, i.e. showing the file/line or symbol info for a code address.
As described in #16094 there are several ways to do this with emscripten.
The first PR is for item 1, using llvm-symbolizer with DWARF.
@sbc100
Copy link
Collaborator

sbc100 commented Aug 15, 2022

Can this be closed now that the tool exists?

@dschuff
Copy link
Member Author

dschuff commented Aug 15, 2022

We currently have 1 and 2 implemented. 3 seems less important (because the browser understands name sections already) and I don't even know what an end-developer use case for 5 would be (mostly I just included it for completeness). But that does leave 3, which maybe we should have before we consider the tool initial-feature-complete?

@dschuff
Copy link
Member Author

dschuff commented Feb 27, 2024

#21367 and llvm/llvm-project#82083 implement item 3.
Item 5 is also partially working: For object files, llvm-nm and llvm-objdump print symbol addresses for both code (as code section offsets) and data (as data section offsets). This is different from the behavior for linked files, where both are printed as file offsets; but more problematically these address spaces overlap because of wasm's harvard architecture, so llvm-symbolizer (and therefore emsymbolizer) doesn't handle them correctly.
I'm not sure how much of a problem that really is; emsymbolizer is mostly used for binary size attribution for linked files rather than to analyze object files, so I don't have a near-term plan to fix it.

The only other possible thing to add is item 4, emscripten symbol map support. I'm not sure how necessary this is given that source maps are strictly more powerful than symbol maps, and it's probably best for most (all?) users to use them instead; I don't know of any use cases where symbol maps would be better. So I also don't currently have a plan to implement that.

@dschuff
Copy link
Member Author

dschuff commented Feb 27, 2024

I think I'm going to go ahead and close this. If someone wants to request more features for emsymbolizer, they can reopen this, or open a new bug if it's something not listed here.

@dschuff dschuff closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants