-
Notifications
You must be signed in to change notification settings - Fork 18.1k
runtime: "invalid pc-encoded table" throw caused by bad cgo traceback #44971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Change https://golang.org/cl/301369 mentions this issue: |
Using the test from https://golang.org/cl/301369, I can verify this affects tip, 1.16, 1.15, 1.14, 1.13, and I didn't test any earlier than that |
Perhaps we could accept bogus data in the traceback only if |
That's an interesting idea, though Cherry made the logically opposite argument on https://golang.org/cl/301369. i.e., Frames.Next should probably not crash the whole runtime if a user passes bad values to CallersFrames. FuncForPC already does a non-strict lookup for the same reason, so I'm slightly inclined to go the same way here. |
@gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn't usually feasible. |
Backport issue(s) opened: #45302 (for 1.15), #45303 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/305889 mentions this issue: |
Change https://golang.org/cl/305890 mentions this issue: |
…ames.Next When using cgo, some of the frames can be provided by cgoTraceback, a cgo-provided function to generate C tracebacks. Unlike Go tracebacks, cgoTraceback has no particular guarantees that it produces valid tracebacks. If one of the (invalid) frames happens to put the PC in the alignment region at the end of a function (filled with int 3's on amd64), then Frames.Next will find a valid funcInfo for the PC, but pcdatavalue will panic because PCDATA doesn't cover this PC. Tolerate this case by doing a non-strict PCDATA lookup. We'll still show a bogus frame, but at least avoid throwing. For #44971 Fixes #45302 Change-Id: I9eed728470d6f264179a7615bd19845c941db78c Reviewed-on: https://go-review.googlesource.com/c/go/+/301369 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]> (cherry picked from commit e4a4161) Reviewed-on: https://go-review.googlesource.com/c/go/+/305890
…ames.Next When using cgo, some of the frames can be provided by cgoTraceback, a cgo-provided function to generate C tracebacks. Unlike Go tracebacks, cgoTraceback has no particular guarantees that it produces valid tracebacks. If one of the (invalid) frames happens to put the PC in the alignment region at the end of a function (filled with int 3's on amd64), then Frames.Next will find a valid funcInfo for the PC, but pcdatavalue will panic because PCDATA doesn't cover this PC. Tolerate this case by doing a non-strict PCDATA lookup. We'll still show a bogus frame, but at least avoid throwing. For #44971 Fixes #45303 Change-Id: I9eed728470d6f264179a7615bd19845c941db78c Reviewed-on: https://go-review.googlesource.com/c/go/+/301369 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]> (cherry picked from commit e4a4161) Reviewed-on: https://go-review.googlesource.com/c/go/+/305889
This still happens with cgosymbolizer on Go 1.15.11:
|
Ah, that is a slightly different crash location. I think the same fix will make sense there, but I need to think about it a bit more. |
Change https://golang.org/cl/309109 mentions this issue: |
This is indeed the same situation, I've sent http://golang.org/cl/309109 to fix. I believe all of the remaining uses of @gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn't usually feasible. This is a follow-up CL for a previously missed case. |
Does this issue perhaps account for #27540? (Can that issue be closed as a duplicate?) |
No, that is definitely a different issue, but I left a comment there about a potential fix. |
Change https://golang.org/cl/309550 mentions this issue: |
Change https://golang.org/cl/309551 mentions this issue: |
…pandFinalInlineFrame This is a follow-up to golang.org/cl/301369, which made the same change in Frames.Next. The same logic applies here: a profile stack may have been truncated at an invalid PC provided by cgoTraceback. expandFinalInlineFrame will then try to lookup the inline tree and crash. The same fix applies as well: upon encountering a bad PC, simply leave it as-is and move on. For #44971 For #45480 Fixes #45482 Change-Id: I2823c67a1f3425466b05384cc6d30f5fc8ee6ddc Reviewed-on: https://go-review.googlesource.com/c/go/+/309109 Reviewed-by: Michael Knyszek <[email protected]> Trust: Michael Pratt <[email protected]> (cherry picked from commit aad13cb) Reviewed-on: https://go-review.googlesource.com/c/go/+/309551 Run-TryBot: Michael Pratt <[email protected]> Reviewed-by: Cherry Zhang <[email protected]> TryBot-Result: Go Bot <[email protected]>
…pandFinalInlineFrame This is a follow-up to golang.org/cl/301369, which made the same change in Frames.Next. The same logic applies here: a profile stack may have been truncated at an invalid PC provided by cgoTraceback. expandFinalInlineFrame will then try to lookup the inline tree and crash. The same fix applies as well: upon encountering a bad PC, simply leave it as-is and move on. For #44971 For #45480 Fixes #45481 Change-Id: I2823c67a1f3425466b05384cc6d30f5fc8ee6ddc Reviewed-on: https://go-review.googlesource.com/c/go/+/309109 Reviewed-by: Michael Knyszek <[email protected]> Trust: Michael Pratt <[email protected]> (cherry picked from commit aad13cb) Reviewed-on: https://go-review.googlesource.com/c/go/+/309550 Run-TryBot: Michael Pratt <[email protected]> Reviewed-by: Cherry Zhang <[email protected]> TryBot-Result: Go Bot <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
When executing cgo code, the signal handler will call
cgoTraceback
viax_cgo_callers
in order to call a traceback from the executing cgo code. This calls the C traceback function provided by the application fromruntime.SetCgoTraceback
.The frames returned from
cgoTraceback
are placed on the top of the recorded stack, followed by a Go runtime-provided trace of the preceding Go callers [1].Later (in the case of CPU profiling),
Frames.Next
will callcgoSymbolizer
to symbolize C frames and usefuncInfo
to symbolize Go frames.... at least, that is how it is supposed to work. In practice, there are no guarantees on what
cgoTraceback
returns. Though it should only return non-Go PCs, there is nothing preventing it from returning a Go PC.Generally, that would work OK (i.e., not crash, though the actual stack trace may not make sense):
Frames.Next
will simply follow the Go path and symbolize the PC as normal.However, if this PC fell in the alignment region between functions (filled with
0xcc
,int 3
on amd64), then:findfunc
will find afuncInfo
for this PC (the preceding function), asfuncInfo
s cover the entire range from the start of one function to the start of the next, including the alignment region.funcInfo
has inline data, we'll do a PCDATA lookup for our PC. PCDATA only cover the actually function range, so that will cause a throw like this:Source for this repro in https://github.com/prattmic/scratch/tree/main/cgo_traceback_issue44971.
The obvious question here is: why would
cgoTraceback
include such a bogus PC? The answer depends on the traceback engine in use bycgoTraceback
.For example, https://github.com/ianlancetaylor/cgosymbolizer uses libgcc's unwind functionality, which uses DWARF information to walk the stack. I've not found a way to trick that into providing bogus results (rather than stopping early) short of using flat-out incorrect .cfa directives in assembly.
On the other hand, simpler traceback engines like Abseil's https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/stacktrace.h perform a more naive (but faster) walk simply following RBP frame pointers. They have some heuristics to try to avoid walking off the deep end, but fundamentally can't fully protect against code that has clobbered the frame pointer. This bug was first encountered with an Abseil-based traceback of assembly code that clobbered RBP to use as a simple argument register, thus resulting in garbage frames frames that would occasionally point into the alignment region between Go functions.
I don't think we can reasonably require
cgoTraceback
to guarantee it always provides valid frames, thus I see a few options here:Frames.Next
to perform a non-strict PCDATA lookup. This is the simplest way to prevent crashes and I think the best approach, but it will make it a bit harder to notice bugs in the native runtime tracebacks.funcInfo
or PCDATA to make them consistent: eitherfuncInfo
does not cover alignment regions, or PCDATA does. I'm not sure how difficult these would be, but this would also potentially mask bugs in Go's tracebacks.Frames
, track which callers came fromcgoTraceback
, and which came from Go's traceback. Only the latter would even attempt to do afuncInfo
/ PCDATA lookup.This affects tip, 1.16, and I believe earlier to at least 1.14, though I haven't tested earlier than 1.16 yet.
[1] A similar principle applies to C-to-Go callbacks, except that
cgoTraceback
is called in the middle of the Go traceback generation.cc @cherrymui @ianlancetaylor @mknyszek @hyangah
The text was updated successfully, but these errors were encountered: