Skip to content

Enable symbolication of native stack frames in ANR events #4061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 17, 2025

Conversation

lealanko-rt
Copy link
Contributor

@lealanko-rt lealanko-rt commented Jan 17, 2025

📜 Description

Add instruction address information and debug image information to AnrV2 events. This enables symbolication of native stack frames in stripped libraries.

Before

before

After

after

💡 Motivation and Context

Earlier, it was nearly impossible to debug ANRs caused by native code in stripped libraries, since no address information was passed in the events. The introduction of io.sentry.anr.attach-thread-dumps (#3791) at least made the raw data available for manual investigation, but that was toilsome.

Symbolication of a stack frame requires an instruction address and a way of identifying a debug image. Both of these are available in the Android stack dumps that AnrV2 reads, the latter in the form of a BuildId. By sending this data and annotating the stack frames correctly, the Sentry server is able to symbolicate them.

💚 How did you test it?

Unit tests were modified to test all added features. Moreover, a button for triggering an ANR via native code was added to the sample Android application. The results can be seen in the screenshots above.

📝 Checklist

  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

Functionally this is pretty complete, I think. The rendering for managed frames in a hybrid stack (as seen in the second screenshot) is a bit ugly, and omits the module (i.e. package+class), so that could perhaps be improved a little.

A natural continuation of this work would be tackling #3295, which is fundamentally about the same thing, but needs to deserialize a protobuf tombstone instead of a textual dump, so there is not much code to be reused.

@lealanko-rt lealanko-rt force-pushed the feat/anr-native-symbolication branch 7 times, most recently from eebfcdf to b1f58bf Compare January 20, 2025 08:33
@markushi
Copy link
Member

@lealanko-rt thanks for opening this up, that's amazing! We'll have a closer look at this asap.

@lealanko-rt
Copy link
Contributor Author

@lealanko-rt thanks for opening this up, that's amazing! We'll have a closer look at this asap.

Hi, thanks for working on this.

What is the workflow for PRs here? My own preference is to absorb all fixes into appropriate commits to produce a clean history, but I'm not sure if that's feasible with multiple people contributing to the branch...

@markushi markushi requested a review from lcian as a code owner March 13, 2025 09:42
@markushi
Copy link
Member

@lealanko-rt thanks for opening this up, that's amazing! We'll have a closer look at this asap.

Hi, thanks for working on this.

What is the workflow for PRs here? My own preference is to absorb all fixes into appropriate commits to produce a clean history, but I'm not sure if that's feasible with multiple people contributing to the branch...

Sorry for the late reply. For sentry-java we always squash PRs into a single commit before merging, as it allows us to revert changes more easily (in case of emergency). I know it has a few downsides and isn't that clean, but it's a trade-off 😅

Lauri Alanko and others added 9 commits March 13, 2025 14:55
Add a button to trigger ANR by holding a lock too long in native
code. This can be used to test native stack frames in ANR events.
Handle offsets and deleted files, recognize "???" as a marker for
unknown functions. Use named capturing groups for better readability
and editability.
Use the "native" attribute of stack frames to indicate JNI
invocation frames, like SentryStackTraceFactory does.
The images are parsed from the build ids and filenames in the thread
dump's stack frames.
The instruction addresses of native stack frames in thread dumps are
relative to the image file from which the code is loaded, and there
are no absolute mapping addresses of images available. So explicitly
inform the Sentry server about the correct images by using a relative
"addr_mode" attribute.

Also add the attribute to the SentryStackFrame class since it was not
yet supported by it. The field documentation is converted from
event.schema.json in the sentry server repo.
@lealanko-rt lealanko-rt force-pushed the feat/anr-native-symbolication branch from e1ada90 to cb6f40e Compare March 13, 2025 18:10
final String debugId = buildId == null ? null : buildIdToDebugId(buildId);
if (debugId != null) {
if (!debugImages.containsKey(debugId)) {
final DebugImage debugImage = new DebugImage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markushi I'm reading our dev docs which say that image_addr and image_size are required, but apparently we're not setting them here and symbolicator is still able to symbolicate the frames. Shall we amend the docs? Or is there a possibility of wrong symbolication given we don't have all the sufficient information here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think that's fine because
1.) the native dumps doesn't seem to even provide this information (which makes sense, as the state is not in memory anymore)
2.) we set the address mode to relative for every frame here as well:
frame.setAddrMode("rel:" + debugId); (link)

Maybe @loewenheim could chime in here as an expert? If all frame addresses are relative, is it fine to provide no image_addr/ image_size in debug meta?

Copy link

@loewenheim loewenheim Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size is always optional (we just conservatively assume it extends to the beginning of the next image if it's not set), and the base address is only required for the absolute address mode. As you say, it's fine to leave it out with the relative address mode.

I'm a bit confused about this, though:

we set the address mode to relative for every frame here as well:
frame.setAddrMode("rel:" + debugId); (link)

The argument of rel in Symbolicator is not a debug ID, it's the index of the module in the debug_meta/images list in the event.
https://github.com/getsentry/symbolicator/blob/2e229b8470fabc5051677ef07d1e77217f921bd2/crates/symbolicator-native/src/interface.rs#L548-L556

Is this fixed up at some later point where the debug ID is replaced by an index? Otherwise this won't work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright @loewenheim thanks for clearing this out! Do you think we should update the dev docs to specify those are not required in the case of relative mode?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would say so.

Copy link
Member

@romtsn romtsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lealanko-rt LGTM, thanks for this PR. While it looks like not much has been changed, it has massive impact and we appreciate your contribution! This is dope.

@romtsn romtsn merged commit 6be3488 into getsentry:main Mar 17, 2025
34 checks passed
@romtsn
Copy link
Member

romtsn commented Mar 17, 2025

@lealanko-rt I've merged the PR, it'll be available with the next version (likely 8.5.0). Thanks for your contribution again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants