runtime: macOS-only segfault on 1.14+ with "split stack overflow" #39079

alexcrichton · 2020-05-14T21:59:46Z

What version of Go are you using (`go version`)?

$ go version
go version go1.14.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/alex/Library/Caches/go-build"
GOENV="/Users/alex/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/alex/code/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.2_1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.2_1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/56/nb9q27rn4r7_d7w02sv74tjm0000gn/T/go-build443341182=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I created a local main.go like so:

package main

// #include <wasm.h>
//
// static wasm_trap_t* myTrampoline(
//    const wasm_val_t *args,
//    wasm_val_t *results
// ) {
//   return NULL;
// }
//
// void my_run() {
//   wasm_engine_t *engine = wasm_engine_new();
//   wasm_store_t *store = wasm_store_new(engine);
//   wasm_functype_t *ty = wasm_functype_new_0_0();
//   wasm_func_t *func = wasm_func_new(store, ty, myTrampoline);
//   wasm_func_call(func, NULL, NULL);
// }
import "C"
import "runtime"

func main() {
	C.my_run()
	runtime.GC()
}

Next I downloaded the latest wasmtime release and extracted it locally:

$ curl -L https://github.com/bytecodealliance/wasmtime/releases/download/dev/wasmtime-dev-x86_64-macos-c-api.tar.xz | tar xJf - --strip-components=1

Next I compiled the local module:

$ CGO_LDFLAGS="`pwd`/lib/libwasmtime.a" CGO_CFLAGS="-I`pwd`/include" go build -o binary

Finally I ran the binary in an infinite loop:

$ while true; do ./binary || break; done

What did you expect to see?

No segfault. Or more specifically for this to basically run infinitely producing no output.

What did you see instead?

Instead I see sporadic crashes. Some I've seen are:

fatal error: runtime: split stack overflow

runtime: newstack sp=0x5791928 stack=[0xc000050000, 0xc000050800]
        morebuf={pc:0x4015580 sp:0xc000050690 lr:0x0}
        sched={pc:0x4053df2 sp:0x5791930 lr:0x0 ctxt:0x0}
runtime: gp=0xc000000180, goid=1, gp->status=0x2
 runtime: split stack overflow: 0x5791928 < 0xc000050000
fatal error: runtime: split stack overflow
runtime stack:

runtime.throw(0x4966544, 0x1d)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/panic.go:1116 +0x72

runtime.newstack()

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/stack.go:1002 +0x83e

runtime.morestack()

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:449 +0x8f
goroutine 1 [running]:

runtime.asmcgocall(0x4056290, 0x5791990)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:640 +0x42 fp=0x5791938 sp=0x5791930 pc=0x4053df2

runtime.libcCall(0x10, 0x5791900, 0x0)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin.go:46 +0x6c fp=0x5791968 sp=0x5791938 pc=0x404471c

runtime.sighandler(0x4b37d00, 0x5791ee0, 0x5791f48, 0x4b37d00)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:583 +0x143 fp=0x57919e8 sp=0x5791968 pc=0x403cf03

runtime.sigtrampgo(0x10, 0x5791ee0, 0x5791f48)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:453 +0x1dc fp=0x5791a60 sp=0x57919e8 pc=0x403caac

runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0

stack: frame={sp:0x5791a60, fp:0x5791a70} stack=[0xc000050000,0xc000050800)
runtime.sigtramp(0x5791f48, 0x5791f48, 0xeaa614b3296df86d, 0x0, 0x4973920, 0x0, 0x5791ab0, 0x7fff6f847613, 0x7ffeefbff5e0, 0x0, ...)

/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33 fp=0x5791a70 sp=0x5791a60 pc=0x4056343

fatal error: unexpected signal during runtime execution

fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0xffffffffffffffc0 pc=0x403c363]

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0
stack: frame={sp:0x57919e0, fp:0x57919f0} stack=[0x5782000,0x5792000)
00000000057918e0: 000000000403d43a <runtime.sighandler+1658> 0000000005791948
00000000057918f0: 000000000000000b 000000c000001800
0000000005791900: 0000000005791958 000000000403cc78 <runtime.adjustSignalStack+328>
0000000005791910: 0000000000000000 0000000005791928
0000000005791920: 0000000005791968 0000000005782000
0000000005791930: 0000000000010000 000000c000000000
0000000005791940: 0000000000000000 0000000005791ee0
0000000005791950: 0000000005791f48 00000000057919d0
0000000005791960: 000000000403ca9e <runtime.sigtrampgo+462> 000000000000000b
0000000005791970: 0000000005791ee0 0000000005791f48
0000000005791980: 000000c000001800 0100000000000000
0000000005791990: 000000c000002000 000000c00000a000
0000000005791: 000000c000002380 000000c000002380
00000000057919b0: 0000000000000000 000000c000001800
00000000057919c0: 0000000005791ee0 0000000005791f48
00000000057919d0: 0000000005791a20 0000000004056343 <runtime.sigtramp+51>
00000000057919e0: <000000000000000b !0000000005791ee0
00000000057919f0: >0000000005791f48 0000000005791f48
0000000005791a00: ec09d23eb3e501c1 0000000004b33ee0
0000000005791a10: 0000000004b63600 0000000000000000
0000000005791a20: 0000000005791aa0 000000000439aaa4
0000000005791a30: 0000000005791590 000000c000001800
0000000005791a40: 0000000005791ee0 0000000005791f48
0000000005791a50: 0000000005791aa0 0000000004056343 <runtime.sigtramp+51>
0000000005791a60: 0000000000000010 0000000005791ee0
0000000005791a70: 0000000005791f48 0000000005791f48
0000000005791a80: 0000000005791ee0 0000000b04b33ee0
0000000005791a90: 0000000005791f48 0000000000000000
0000000005791aa0: 0000000005791ab0 00007fff6f8475fd
0000000005791ab0: 000000c0000527c8 0000000000000000
0000000005791ac0: 0000000000000000 0000000000000000
0000000005791ad0: 0000000000000000 000000060000000e
0000000005791ae0: fffffffffffffff0 0000000000000000
runtime.throw(0x4967d1c, 0x2a)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:679 +0x46a
runtime.(*sigctxt).preparePanic(0x5791948, 0xb, 0xc000001800)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_amd64.go:68 +0x93
runtime.sighandler(0xb, 0x5791ee0, 0x5791f48, 0xc000001800)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:578 +0x67a
runtime.sigtrampgo(0xb, 0x5791ee0, 0x5791f48)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:452 +0x1ce
runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0
stack: frame={sp:0x57919e0, fp:0x57919f0} stack=[0x5782000,0x5792000)
00000000057918e0: 000000000403d43a <runtime.sighandler+1658> 0000000005791948
00000000057918f0: 000000000000000b 000000c000001800
0000000005791900: 0000000005791958 000000000403cc78 <runtime.adjustSignalStack+328>
0000000005791910: 0000000000000000 0000000005791928
0000000005791920: 0000000005791968 0000000005782000
0000000005791930: 0000000000010000 000000c000000000
0000000005791940: 0000000000000000 0000000005791ee0
0000000005791950: 0000000005791f48 00000000057919d0
0000000005791960: 000000000403ca9e <runtime.sigtrampgo+462> 000000000000000b
0000000005791970: 0000000005791ee0 0000000005791f48
0000000005791980: 000000c000001800 0100000000000000
0000000005791990: 000000c000002000 000000c00000a000
00000000057919a0: 000000c000002380 000000c000002380
00000000057919b0: 0000000000000000 000000c000001800
00000000057919c0: 0000000005791ee0 0000000005791f48
00000000057919d0: 0000000005791a20 0000000004056343 <runtime.sigtramp+51>
00000000057919e0: <000000000000000b !0000000005791ee0
00000000057919f0: >0000000005791f48 0000000005791f48
0000000005791a00: ec09d23eb3e501c1 0000000004b33ee0
0000000005791a10: 0000000004b63600 0000000000000000
0000000005791a20: 0000000005791aa0 000000000439aaa4
0000000005791a30: 0000000005791590 000000c000001800
0000000005791a40: 0000000005791ee0 0000000005791f48
0000000005791a50: 0000000005791aa0 0000000004056343 <runtime.sigtramp+51>
0000000005791a60: 0000000000000010 0000000005791ee0
0000000005791a70: 0000000005791f48 0000000005791f48
0000000005791a80: 0000000005791ee0 0000000b04b33ee0
0000000005791a90: 0000000005791f48 0000000000000000
0000000005791aa0: 0000000005791ab0 00007fff6f8475fd
0000000005791ab0: 000000c0000527c8 0000000000000000
0000000005791ac0: 0000000000000000 0000000000000000
0000000005791ad0: 0000000000000000 000000060000000e
0000000005791ae0: fffffffffffffff0 0000000000000000
runtime.sigtramp(0x5791f48, 0x5791f48, 0xec09d23eb3e501c1, 0x4b33ee0, 0x4b63600, 0x0, 0x5791aa0, 0x439aaa4, 0x5791590, 0xc000001800, ...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33

goroutine 5 [running]:
runtime.asmcgocall(0x4056290, 0x5791910)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:640 +0x42 fp=0x57918b8 sp=0x57918b0 pc=0x4053df2
runtime.(*sigctxt).preparePanic(0x5791948, 0xb, 0xc000001800)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_amd64.go:66 +0x93 fp=0x57918e8 sp=0x57918b8 pc=0x403c363
runtime.sighandler(0xb, 0x5791ee0, 0x5791f48, 0xc000001800)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:578 +0x67a fp=0x5791968 sp=0x57918e8 pc=0x403d43a
runtime.sigtrampgo(0xb, 0x5791ee0, 0x5791f48)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:452 +0x1ce fp=0x57919e0 sp=0x5791968 pc=0x403ca9e
runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0
stack: frame={sp:0x57919e0, fp:0x57919f0} stack=[0xc000052000,0xc000052800)

runtime.sigtramp(0x5791f48, 0x5791f48, 0xec09d23eb3e501c1, 0x4b33ee0, 0x4b63600, 0x0, 0x5791aa0, 0x439aaa4, 0x5791590, 0xc000001800, ...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33 fp=0x57919f0 sp=0x57919e0 pc=0x4056343
created by runtime.gcBgMarkStartWorkers
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1821 +0x77

goroutine 1 [wait for GC cycle]:
runtime.GC()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1099 +0x64
main.main()
/Users/alex/code/wut/foo.go:24 +0x25

fatal error: runtime: stack split at bad time

runtime: newstack at runtime.goPanicIndex+0xa9 sp=0x5791960 stack=[0xc000050000, 0xc000050800] morebuf={pc:0x403d560 sp:0x5791968 lr:0x0} sched={pc:0x40283b9 sp:0x5791960 lr:0x0 ctxt:0x0} os/signal.signal_ignored(...) /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sigqueue.go:265 runtime.sighandler(0xc000000180, 0x5791ee0, 0x5791f48, 0xc000000180) /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:588 +0x7a0 fp=0x57919e8 sp=0x5791968 pc=0x403d560 runtime.sigtrampgo(0x10, 0x5791ee0, 0x5791f48) /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:453 +0x1dc fp=0x5791a60 sp=0x57919e8 pc=0x403caac runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0 stack: frame={sp:0x5791a60, fp:0x5791a70} stack=[0xc000050000,0xc000050800)

runtime.sigtramp(0x5791f48, 0x5791f48, 0x7e0fb3c827040376, 0x0, 0x4973920, 0x0, 0x5791ab0, 0x7fff6f847613, 0xc000050640, 0x0, ...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33 fp=0x5791a70 sp=0x5791a60 pc=0x4056343
fatal error: runtime: stack split at bad time

runtime stack:
runtime.throw(0x4966d45, 0x20)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/panic.go:1116 +0x72 fp=0x7ffeefbff760 sp=0x7ffeefbff730 pc=0x402a5a2
runtime.newstack()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/stack.go:951 +0xb42 fp=0x7ffeefbff8f0 sp=0x7ffeefbff760 pc=0x4041352
runtime.morestack()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:449 +0x8f fp=0x7ffeefbff8f8 sp=0x7ffeefbff8f0 pc=0x405271f

goroutine 1 [syscall]:
os/signal.signal_ignored(...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sigqueue.go:265
runtime.sighandler(0xc000000180, 0x5791ee0, 0x5791f48, 0xc000000180)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:588 +0x7a0 fp=0x57919e8 sp=0x5791968 pc=0x403d560
runtime.sigtrampgo(0x10, 0x5791ee0, 0x5791f48)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:453 +0x1dc fp=0x5791a60 sp=0x57919e8 pc=0x403caac
runtime: unexpected return pc for runtime.sigtramp called from 0x5791ee0
stack: frame={sp:0x5791a60, fp:0x5791a70} stack=[0xc000050000,0xc000050800)

runtime.sigtramp(0x5791f48, 0x5791f48, 0x7e0fb3c827040376, 0x0, 0x4973920, 0x0, 0x5791ab0, 0x7fff6f847613, 0xc000050640, 0x0, ...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33 fp=0x5791a70 sp=0x5791a60 pc=0x4056343

goroutine 2 [force gc (idle)]:
runtime.gopark(0x4968c50, 0x4b37870, 0x1411, 0x1)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:304 +0xe0 fp=0xc000050fb0 sp=0xc000050f90 pc=0x402cff0
runtime.goparkunlock(...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:310
runtime.forcegchelper()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:253 +0xb7 fp=0xc000050fe0 sp=0xc000050fb0 pc=0x402cea7
runtime.goexit()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000050fe8 sp=0xc000050fe0 pc=0x40546b1
created by runtime.init.6
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:242 +0x35

goroutine 3 [GC sweep wait]:
runtime.gopark(0x4968c50, 0x4b379a0, 0x140c, 0x1)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:304 +0xe0 fp=0xc0000517a8 sp=0xc000051788 pc=0x402cff0
runtime.goparkunlock(...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:310
runtime.bgsweep(0xc000076000)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgcsweep.go:70 +0x9c fp=0xc0000517d8 sp=0xc0000517a8 pc=0x401c09c
runtime.goexit()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0000517e0 sp=0xc0000517d8 pc=0x40546b1
created by runtime.gcenable
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:214 +0x5c

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x4968c50, 0x4b37960, 0x140d, 0x1)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:304 +0xe0 fp=0xc000051f78 sp=0xc000051f58 pc=0x402cff0
runtime.goparkunlock(...)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:310
runtime.bgscavenge(0xc000076000)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgcscavenge.go:237 +0xd0 fp=0xc000051fd8 sp=0xc000051f78 pc=0x401a690
runtime.goexit()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000051fe0 sp=0xc000051fd8 pc=0x40546b1
created by runtime.gcenable
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:215 +0x7e

goroutine 18 [GC worker (idle)]:
runtime.gopark(0x4968ae8, 0xc0000140b0, 0x1418, 0x0)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/proc.go:304 +0xe0 fp=0xc00004c760 sp=0xc00004c740 pc=0x402cff0
runtime.gcBgMarkWorker(0xc000024000)
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1873 +0xff fp=0xc00004c7d8 sp=0xc00004c760 pc=0x40156bf
runtime.goexit()
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00004c7e0 sp=0xc00004c7d8 pc=0x40546b1
created by runtime.gcBgMarkStartWorkers
/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1821 +0x77

This was originally reported upstream in bytecodealliance/wasmtime-go#10, and we've been trying to narrow it down. With some investigation we found out that Go 1.13 runs this code successfully. We've also got the same code running succesfully on other platforms.

I realize though that this isn't the best bug report, unfortunately. The native library, wasmtime, is a pretty large project and is a giant wad of compiled Rust code. I've tried replacing it with a trivial C implementation to remove the dependency, but then the crash goes away. It seems that the bug here is related to something that the native binary is doing. I'm pretty certain that the fault does not lie in the native binary (e.g. no segfault or out of bounds writes or anything like that), but as with all native code I can't really entirely rule it out. I'm opening this because at this point we've at least narrowed it down to a regression between Go versions, and I'm hoping that folks more knowledgeable with changes could help out?

Is there a way we could help to reduce this further to a bite-sized test case? Or would it be helpful to perhaps bisect the Go release to try to find a revision which caused the segfault to appear here? I'm happy to help out in reducing this further!

The text was updated successfully, but these errors were encountered:

ALTree · 2020-05-15T09:52:57Z

It'll be hard to debug this if it's essentially linking in a big blob of non-Go code. But maybe looking at the stack-traces can give some hint.

cc @aclements @mknyszek

mknyszek · 2020-05-15T14:00:12Z

Looking at those failures, the first thing that jumps out at me is that the sp stored in sched is not anywhere within the goroutine's stack bounds. How that value (upper bits 0x579...) gets there is still a mystery, though. It would be useful to know what's actually at that address, it could give us a clue. If it turns out that's an sp for a C or Rust stack, it could be something about the sp not being updated properly on return to Go. Do you happen know at what point it's failing? Assuming this is bad logic and not a race, it should be possible to print-debug this (or at least debuglog it if it's sensitive to scheduling) to figure out when the bad value is being written.

alexcrichton · 2020-05-15T15:57:43Z

Oh some other aspects I should mention is that GOMAXPROCS=1 appears to fix this issue. I should also mention that Wasmtime is a JIT-compiler for WebAssembly, and so there's some JIT compilation going on here and at one point the thread executing the native function will be executing some JIT-code. I mention this JIT aspect because if the cgo call never enters JIT code then I can't get this to crash. Only when JIT code is entered at some point have I gotten it to crash at some point (I still don't know where the crash is).

So far the best I can surmise is that this is crashing when GC is happening. The call to runtime.GC(), if removed, makes this never crash. For this program I believe there's no foreign code running when the crash happens, I believe it's back in Go at that point.

I'm currently trying to capture a crash in lldb, but it's not being too too successful. I've tried capturing a few core dumps but they're not proving too useful. The core dumps show all threads blocked in __psynch_cvwait or __munmap so I'm not actually sure which thread was the faulting one, much less which instruction was faulting.

You mention there's print-debugging that can be added, do you mean in the runtime? If so are there logs/etc I could try to add to help debug this?

mknyszek · 2020-05-15T18:44:26Z

Oh some other aspects I should mention is that GOMAXPROCS=1 appears to fix this issue. I should also mention that Wasmtime is a JIT-compiler for WebAssembly, and so there's some JIT compilation going on here and at one point the thread executing the native function will be executing some JIT-code. I mention this JIT aspect because if the cgo call never enters JIT code then I can't get this to crash. Only when JIT code is entered at some point have I gotten it to crash at some point (I still don't know where the crash is).

So far the best I can surmise is that this is crashing when GC is happening. The call to runtime.GC(), if removed, makes this never crash. For this program I believe there's no foreign code running when the crash happens, I believe it's back in Go at that point.

That makes sense to me. I imagine that given that the file you have above doesn't do much in the Go world that the goroutine calling into cgo has no reason to be preempted, which is where the failures are popping up (morestack -> newstack is called by a goroutine when the stack bound is poisoned to preempt itself). Once a GC is triggered the goroutine is almost immediately scheduled for scanning, so the next call it makes (somewhere in GC or gcWaitOnMark, probably) it gets preempted. If it goes to sleep first and never falls into this path since it can be scanned without preempting it, I could see this being flaky. This suggests that sched.sp is actually getting clobbered prior to the GC.

I'm currently trying to capture a crash in lldb, but it's not being too too successful. I've tried capturing a few core dumps but they're not proving too useful. The core dumps show all threads blocked in __psynch_cvwait or __munmap so I'm not actually sure which thread was the faulting one, much less which instruction was faulting.

You mention there's print-debugging that can be added, do you mean in the runtime? If so are there logs/etc I could try to add to help debug this?

I think I take that back. I was going to suggest putting println calls anywhere g.sched.sp is set, but after looking at all those places manually, I struggle to understand how a value from non-Go code could get there.

It still may be worth putting in println calls there (though I'm less certain of its value now), or if that affects scheduling too much, putting dlog().s("some data here").end() calls and compiling the Go part of the application with -tags=debuglog.

Does the JIT'd code make use of TLS at all (or the Rust code, even)? The value being clobbered is accessible via an offset from a pointer in a TLS slot. Given that the offset is the same for Go 1.13, I'd be surprised if this is the case, but it is an idea.

Given that this is involves the scheduler, one other thing to try is GODEBUG=asyncpreempt=off since Go 1.14 had a bunch of changes to the scheduler made to support asynchronous preemption.

mknyszek · 2020-05-15T18:50:16Z

RE: that last idea about TLS, @cherrymui just informed me that we use a fixed offset for the TLS on Darwin, so that's unlikely to have changed either.

mknyszek · 2020-05-15T18:52:31Z

Also the GOMAXPROCS=1 fixing the issue might just mean that the goroutine never gets preempted by another worker; it just hands off control when it goes to wait for marking to complete, so it can be scanned without getting preempted.

This is assuming that my hypothesis is true that the value in g.sched.sp always gets clobbered, and it only sometimes fails because of scheduling.

alexcrichton · 2020-05-15T19:04:36Z

To make sure I understand, do you mean adding println calls in the runtime or in the program above? Adding some println to the small program it's definitely crashing inside of runtime.GC(), since it doesn't reach the end of the main function.

Does the JIT'd code make use of TLS at all (or the Rust code, even)?

The JIT'd code doesn't use TLS, but the Rust code does. We use the equivalent of thread_local in C/C++, which should be doing some offset-related thing from some TLS register. Writing a trivial Rust program which frobs TLS a bit and then exits in the native call doesn't exhibit the same crash as when using the rest of wasmtime, however. Would it help to provide some disassembly of what the Rust TLS is doing?

Given that this is involves the scheduler, one other thing to try is GODEBUG=asyncpreempt=off since Go 1.14 had a bunch of changes to the scheduler made to support asynchronous preemption.

Interesting! Setting that environment variable though I still see a crash.

fatal error: unexpected signal during runtime execution

fatal error: unexpected signal during runtime execution                                                                        
[signal SIGSEGV: segmentation violation code=0x1 addr=0xffffffffffffffc0 pc=0x403c203]                                                                                                                                                                        
                                                                                                                               
goroutine 0 [idle]:                                                                                                            
runtime: unexpected return pc for runtime.sigtramp called from 0x4fd0ee0                                                       
stack: frame={sp:0x4fd09e0, fp:0x4fd09f0} stack=[0x4fc1000,0x4fd1000)                                                          
0000000004fd08e0:  000000000403d2da <runtime.sighandler+1658>  0000000004fd0948                                                
0000000004fd08f0:  000000000000000b  000000c000082300                                                                          
0000000004fd0900:  0000000004fd0958  000000000403cb18 <runtime.adjustSignalStack+328>                                                                                                                                                                         
0000000004fd0910:  0000000000000000  0000000004fd0928                                                                          
0000000004fd0920:  0000000004fd0968  0000000004fc1000                                                                          
0000000004fd0930:  0000000000010000  000000c000000000                                                                          
0000000004fd0940:  000000c000092750  0000000004fd0ee0                                                                          
0000000004fd0950:  0000000004fd0f48  0000000004fd09d0                                                                                                                                                                                                         
0000000004fd0960:  000000000403c93e <runtime.sigtrampgo+462>  000000000000000b                                                                                                                                                                                
0000000004fd0970:  0000000004fd0ee0  0000000004fd0f48                                                                          
0000000004fd0980:  000000c000082300  010000000403cb18                                                                          
0000000004fd0990:  000000c000002000  000000c00000a000                                                                          
0000000004fd09a0:  000000c000002380  000000c000002380                                                                                                                                                                                                         
0000000004fd09b0:  0000000000000000  000000c000082300                                                                          
0000000004fd09c0:  0000000004fd0ee0  0000000004fd0f48                                                                                                                                                                                                         
0000000004fd09d0:  0000000004fd0a20  00000000040561e3 <runtime.sigtramp+51>                                                    
0000000004fd09e0: <000000000000000b !0000000004fd0ee0                                                                                                                                                                                                         
0000000004fd09f0: >0000000004fd0f48  0000000004fd0f48                                                                          
0000000004fd0a00:  21ac1896a9ab6394  00000000048ecf60                                                                                                                                                                                                         
0000000004fd0a10:  000000000491c5d8  0000000000000000                                                                          
0000000004fd0a20:  0000000004fd0aa0  00000000041abb34                                                                          
0000000004fd0a30:  0000000000000000  000000c000082300                                                                          
0000000004fd0a40:  0000000004fd0ee0  0000000004fd0f48                                                                                                                                                                                                         
0000000004fd0a50:  0000000004fd0aa0  00000000040561e3 <runtime.sigtramp+51>                                                                                                                                                                                   
0000000004fd0a60:  0000000000000010  0000000004fd0ee0                                                                          
0000000004fd0a70:  0000000004fd0f48  0000000004fd0f48                                                                          
0000000004fd0a80:  0000000004fd0ee0  0000000b048ecf60                                                                          
0000000004fd0a90:  0000000004fd0f48  0000000000000000                                                                          
0000000004fd0aa0:  0000000004fd0ab0  00007fff6f8475fd                                                                          
0000000004fd0ab0:  0000000004fd0958  0000000000000000                                                                          
0000000004fd0ac0:  0000000000000000  0000000000000000                                                                          
0000000004fd0ad0:  0000000000000000  000000060000000e                                                                          
0000000004fd0ae0:  fffffffffffffff0  0000000000000000 
runtime.throw(0x476c13c, 0x2a)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:679 +0x46a
runtime.(*sigctxt).preparePanic(0x4fd0948, 0xb, 0xc000082300)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_amd64.go:68 +0x93
runtime.sighandler(0xb, 0x4fd0ee0, 0x4fd0f48, 0xc000082300)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:578 +0x67a
runtime.sigtrampgo(0xb, 0x4fd0ee0, 0x4fd0f48)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:452 +0x1ce
runtime: unexpected return pc for runtime.sigtramp called from 0x4fd0ee0
stack: frame={sp:0x4fd09e0, fp:0x4fd09f0} stack=[0x4fc1000,0x4fd1000)
0000000004fd08e0:  000000000403d2da <runtime.sighandler+1658>  0000000004fd0948 
0000000004fd08f0:  000000000000000b  000000c000082300 
0000000004fd0900:  0000000004fd0958  000000000403cb18 <runtime.adjustSignalStack+328> 
0000000004fd0910:  0000000000000000  0000000004fd0928 
0000000004fd0920:  0000000004fd0968  0000000004fc1000 
0000000004fd0930:  0000000000010000  000000c000000000 
0000000004fd0940:  000000c000092750  0000000004fd0ee0 
0000000004fd0950:  0000000004fd0f48  0000000004fd09d0 
0000000004fd0960:  000000000403c93e <runtime.sigtrampgo+462>  000000000000000b 
0000000004fd0970:  0000000004fd0ee0  0000000004fd0f48 
0000000004fd0980:  000000c000082300  010000000403cb18 
0000000004fd0990:  000000c000002000  000000c00000a000 
0000000004fd09a0:  000000c000002380  000000c000002380 
0000000004fd09b0:  0000000000000000  000000c000082300 
0000000004fd09c0:  0000000004fd0ee0  0000000004fd0f48 
0000000004fd09d0:  0000000004fd0a20  00000000040561e3 <runtime.sigtramp+51> 
0000000004fd09e0: <000000000000000b !0000000004fd0ee0 
0000000004fd09f0: >0000000004fd0f48  0000000004fd0f48 
0000000004fd0a00:  21ac1896a9ab6394  00000000048ecf60 
0000000004fd0a10:  000000000491c5d8  0000000000000000 
0000000004fd0a20:  0000000004fd0aa0  00000000041abb34 
0000000004fd0a30:  0000000000000000  000000c000082300 
0000000004fd0a40:  0000000004fd0ee0  0000000004fd0f48 
0000000004fd0a50:  0000000004fd0aa0  00000000040561e3 <runtime.sigtramp+51> 
0000000004fd0a60:  0000000000000010  0000000004fd0ee0 
0000000004fd0a70:  0000000004fd0f48  0000000004fd0f48 
0000000004fd0a80:  0000000004fd0ee0  0000000b048ecf60 
0000000004fd0a90:  0000000004fd0f48  0000000000000000 
0000000004fd0aa0:  0000000004fd0ab0  00007fff6f8475fd 
0000000004fd0ab0:  0000000004fd0958  0000000000000000 
0000000004fd0ac0:  0000000000000000  0000000000000000 
0000000004fd0ad0:  0000000000000000  000000060000000e 
0000000004fd0ae0:  fffffffffffffff0  0000000000000000 
runtime.sigtramp(0x4fd0f48, 0x4fd0f48, 0x21ac1896a9ab6394, 0x48ecf60, 0x491c5d8, 0x0, 0x4fd0aa0, 0x41abb34, 0x0, 0xc000082300, ...)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33

goroutine 34 [running]:
runtime.asmcgocall(0x4056130, 0x4fd0910)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/asm_amd64.s:640 +0x42 fp=0x4fd08b8 sp=0x4fd08b0 pc=0x4053c92
runtime.(*sigctxt).preparePanic(0x4fd0948, 0xb, 0xc000082300)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_amd64.go:66 +0x93 fp=0x4fd08e8 sp=0x4fd08b8 pc=0x403c203
runtime.sighandler(0xb, 0x4fd0ee0, 0x4fd0f48, 0xc000082300)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:578 +0x67a fp=0x4fd0968 sp=0x4fd08e8 pc=0x403d2da
runtime.sigtrampgo(0xb, 0x4fd0ee0, 0x4fd0f48)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/signal_unix.go:452 +0x1ce fp=0x4fd09e0 sp=0x4fd0968 pc=0x403c93e
runtime: unexpected return pc for runtime.sigtramp called from 0x4fd0ee0
stack: frame={sp:0x4fd09e0, fp:0x4fd09f0} stack=[0xc000092000,0xc000092800)

runtime.sigtramp(0x4fd0f48, 0x4fd0f48, 0x21ac1896a9ab6394, 0x48ecf60, 0x491c5d8, 0x0, 0x4fd0aa0, 0x41abb34, 0x0, 0xc000082300, ...)
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/sys_darwin_amd64.s:229 +0x33 fp=0x4fd09f0 sp=0x4fd09e0 pc=0x40561e3
created by runtime.gcBgMarkStartWorkers
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1821 +0x77

goroutine 1 [wait for GC cycle]:
runtime.GC()
        /usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/mgc.go:1099 +0x64
main.main()
        /Users/alex/code/example/foo.go:9 +0x25
build.sh: line 20: 74136 Illegal instruction: 4  ./binary

mknyszek · 2020-05-15T19:13:26Z

To make sure I understand, do you mean adding println calls in the runtime or in the program above? Adding some println to the small program it's definitely crashing inside of runtime.GC(), since it doesn't reach the end of the main function.

Ah, I meant the runtime. I'm currently running it in a loop, so I don't mind doing the instrumentation. I'll see if I find anything by printing info.

Does the JIT'd code make use of TLS at all (or the Rust code, even)?

The JIT'd code doesn't use TLS, but the Rust code does. We use the equivalent of thread_local in C/C++, which should be doing some offset-related thing from some TLS register. Writing a trivial Rust program which frobs TLS a bit and then exits in the native call doesn't exhibit the same crash as when using the rest of wasmtime, however. Would it help to provide some disassembly of what the Rust TLS is doing?

Yeah OK, then it's probably not that. You mentioned earlier that the JIT'd code needs to run to reproduce the issue, right?

Given that this is involves the scheduler, one other thing to try is GODEBUG=asyncpreempt=off since Go 1.14 had a bunch of changes to the scheduler made to support asynchronous preemption.

Interesting! Setting that environment variable though I still see a crash.

fatal error: unexpected signal during runtime execution

Got it. Well, it doesn't discount the changes around it aren't involved (directly or indirectly), but definitely means that the asynchronous preemption itself isn't an issue here.

alexcrichton · 2020-05-15T19:19:21Z

A small amount of JIT code is executed inside of wasm_func_call for weird implementation reasons. If the call to wasm_func_call is removed then no crash ends up happening (or at least I couldn't get it to crash).

Also, to confirm, are you able to reproduce the crash on your end? I still feel bad about not being able to minimize a giant wad of foreign code, but if you can't even reproduce that's even worse!

mknyszek · 2020-05-15T19:21:53Z

I can indeed reproduce! Sometimes it fails with a plain-old segfault, sometimes with an illegal instruction failure, and sometimes with one of the "stack split" failures you mentioned in the original post.

See upstream bytecodealliance/wasmtime-go#10 and golang/go#39079

avidal · 2020-06-11T15:56:11Z

Just fyi, this seems to be happening on 1.14.4 as well. At least, in the wasmtime-go repository. I haven't tested this specific repro.

networkimprov · 2020-06-11T16:30:49Z

@mknyszek @dmitshur should this get a 1.15 or 1.16 milestone?

ianlancetaylor · 2020-06-11T22:01:44Z

This is almost certainly some kind of memory corruption, perhaps due to a race condition. I haven't seen anybody else reporting similar errors. Given the information we have so far, I would not guess that this is a bug in Go.

aclements · 2020-06-12T19:33:03Z

Given that this is involves the scheduler, one other thing to try is GODEBUG=asyncpreempt=off since Go 1.14 had a bunch of changes to the scheduler made to support asynchronous preemption.
Interesting! Setting that environment variable though I still see a crash.

This was spelled wrong. Try GODEBUG=asyncpreemptoff=1. I also suspect this has something to do with async preemption, though I would guess the bug relates to something wasmtime is doing with signal handlers given the corrupted SP and all the tracebacks mentioning signal handling code.

alexcrichton · 2020-06-12T22:04:21Z

Oh thanks for the tip @aclements! Using that env var (GODEBUG=asyncpreemptoff=1) the original example no longer shows a segfault, so it definitely sounds like it's related to that. Furthermore I was able to debug this further with that knowledge and simplify the reproduction to not require any Rust code at all:

package main

// #include <assert.h>
// #include <signal.h>
// #include <sys/mman.h>
//
// #define ALT_STACK_SIZE (16 * 4096)
//
// void my_run() {
//   void *stack = mmap(NULL, ALT_STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
//   assert(stack != MAP_FAILED);
//
//   stack_t new_stack;
//   new_stack.ss_sp = stack;
//   new_stack.ss_flags = 0;
//   new_stack.ss_size = ALT_STACK_SIZE;
//   int r = sigaltstack(&new_stack, NULL);
//   assert(r == 0);
// }
import "C"
import "runtime"

func main() {
  C.my_run()
  runtime.GC()
}

This program will segfault like original one when run as:

$ go build -o binary
$ while true; do ./binary || break; done

This one can take quite some time to reproduce, so I typically run that in a few windows to get it to fail quicker. I've also seen that mixing in code like seen in preempt.go can cause it to more reliably reproduce. It does seems related to async scheduling in any case. Some example crash logs I've seen are here.

So at least on macOS this seems related to switching the sigaltstack. We do this in wasmtime currently because we execute our SIGSEGV handler on the sigaltstack, but we require the stack to be big enough to run the handler. Currently that stack size is set to 64kb, and it looks like the default Go one is 8kb, so the check to see if a previous sigaltstack is big enough fails and we allocate a new one.

Is it expected that allocating a bigger sigaltstack causes issues here? Is this something that foreign code isn't allowed to do?

mknyszek · 2020-06-13T04:18:58Z

@alexcrichton Oops, very sorry about the wrong environment variable there. Glad that it narrowed down the issue, though (and thanks for narrowing down the reproducer even further!).

aclements · 2020-06-15T01:20:04Z

/cc @ianlancetaylor @bcmills , our signal experts.

Thanks for narrowing it down to such a small repro!

Go allocates a 32 KiB signal stack, but that of course suggests that a 64 KiB signal stack should be fine.

I could see there potentially being a problem if the sigaltstack call from C races with a signal being delivered to the Go runtime, but I'm really surprised to see it crash when there's no apparent race with starting GC. It's possible there's a race with a regular scheduler preemption sending a signal. Are you sure the runtime.GC() call is necessary for the reproducer?

alexcrichton · 2020-06-15T13:54:14Z

@mknyszek oh no worries! I'm just glad I was able to reduce it to a much more bite-sized chunk.

@aclements after running a few million times I wasn't able to reproduce without the call to runtime.GC() I tried tinkering a bit to force something related to async preemption and see if it faulted but I also wasn't able to reduce it any further. (also sorry I misread 0x8000 as 8kb, you're right that's indeed 32kb!)

bcmills · 2020-06-15T15:56:04Z

What version of macOS was this observed on?
(#37605 describes another situation in which the macOS signal handler seems to race with what ought to be atomic updates to thread state.)

alexcrichton · 2020-06-15T16:12:33Z

Oh oops sorry should have mentioned that in the original report as well. I'm on version 10.15.5 (19F101)

bcmills · 2020-06-15T16:48:01Z

That said, this example in #39079 (comment) does not look safe in general to me. If there is already an alternate signal stack in place, then something somewhere (in this case, presumably the Go runtime) allocated that signal stack, and may expect to use sigaltstack to subsequently locate it for deallocation.

So I would expect that a program that does this sort of sigaltstack reallocation in Go should use the pattern:

	runtime.LockOSThread()
	s := C.replaceAltStack()
	defer func() {
		C.restoreAltStack(s)
		runtime.UnlockOSThread()
	}()
	…

So, another question: does the bug still reproduce if you execute runtime.LockOSThread() prior to registering the alternate stack? (My guess would be that it does, but it never hurts to verify.)

alexcrichton · 2020-06-15T17:44:55Z

As predicted adding runtime.LockOSThread() doesn't fix the crash, it still crashes with that added. I wanted to confirm again though, y'all originally could reproduce but is that still the case with the updated example?

I also understand how modifying the sigaltstack in general is not exactly the safest thing to do, but it seems like "hygienic" usage of sigaltstack would be to grow it if necessary and otherwise not modify it. That's what wasmtime is trying to do (grow it to fit its needs), and it's expected that if anything else configured it then it would take care of deallocating it on its own or would otherwise.

Basically at this point I don't understand why this would be invalid. The specific case wasmtime needs is to grow the sigaltstack to meet its needs. Is this incompatible with the Go runtime? If so, why? Comments like this seem to indicate that it's intended to be handled?

ianlancetaylor · 2020-06-15T18:27:29Z

The runtime is supposed to do the right thing if a C thread changes the alternate signal stack. As @alexcrichton says, adjustSIgnalStack is supposed to handle it.

Clearly it isn't working, though. The stack trace suggests that at least sometimes the stack guard is not being updated to correctly match the alternate signal stack installed by the C code. I didn't audit all the code, but it looks like the failure is the first function called that checks the stack guard.

Does it look like the code in setGsignalStack in runtime/signal_unix.go is doing the right thing to update the stack guard?

gopherbot · 2020-06-15T20:36:42Z

Change https://golang.org/cl/238020 mentions this issue: runtime: set g to gsignal before adjustSignalStack

cherrymui · 2020-06-15T20:45:12Z

@alexcrichton does CL https://golang.org/cl/238020 help? Thanks.

alexcrichton · 2020-06-15T21:38:06Z

@cherrymui thanks for the cc! Locally that appears to fix the issue for me, yes.

Using that toolchain against the entire https://github.com/bytecodealliance/wasmtime-go repository's test (which originally segfaulted and currently segfault on 1.14.4 periodically) the issue also appears fixed. I no longer get segfaults there either.

cherrymui · 2020-06-15T23:30:20Z

@alexcrichton thanks for confirming!

mengzhuo · 2020-06-16T03:49:50Z

FYI, This CL failed on mips64le with

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x12003ab44]

runtime stack:
runtime.throw(0x12066fb59, 0x2a)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/panic.go:1116 +0x6c
runtime.sigpanic()
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/signal_unix.go:704 +0x5f0
runtime.wbBufFlush1(0xc000033000)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/mwbbuf.go:278 +0x104
runtime.wbBufFlush.func1()
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/mwbbuf.go:218 +0x40
runtime.systemstack(0x0)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/asm_mips64x.s:206 +0xa8
runtime.mstart()
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/proc.go:1101

https://build.golang.org/log/cd5213f231841de332cad8fced10d6cbf6ca5ac9

cherrymui · 2020-06-16T16:53:25Z

@mengzhuo It doesn't seem related to that CL. It could be a flake. It seems a separate issue to me.

albertvaka · 2020-10-15T09:25:48Z

Any chance this gets backported to 1.14? It's a crash. @cherrymui

KSerrania · 2020-10-15T10:59:47Z

@gopherbot please consider this for backport in 1.14. The issue can cause runtime crashes on MacOS, and there doesn't seem to be any reasonable workaround.

gopherbot · 2020-10-15T11:00:13Z

Backport issue(s) opened: #41991 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot · 2020-10-15T13:03:31Z

Change https://golang.org/cl/262557 mentions this issue: [release-branch.go1.14] runtime: set g to gsignal before adjustSignalStack

…Stack When a signal is received, the runtime probes whether an alternate signal stack is set, if so, adjust gsignal's stack to point to the alternate signal stack. This is done in adjustSignalStack, which calls sigaltstack "syscall", which is a libc call on darwin through asmcgocall. asmcgocall decides whether to do stack switch based on whether we're running on g0 stack, gsignal stack, or regular g stack. If g is not set to gsignal, asmcgocall may make wrong decision. Set g first. adjustSignalStack is recursively nosplit, so it is okay that temporarily gsignal.stack doesn't match the stack we're running on. Updates #39079. Fixes #41991. Change-Id: I59b2c5dc08c3c951f1098fff038bf2e06d7ca055 Reviewed-on: https://go-review.googlesource.com/c/go/+/238020 Run-TryBot: Cherry Zhang <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> (cherry picked from commit d286e61) Reviewed-on: https://go-review.googlesource.com/c/go/+/262557 Trust: Cherry Zhang <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Austin Clements <[email protected]>

The fix for golang/go#39079 has been backported to 1.14 as of version 1.14.11 so the macOS warning can be clarified and the build matrix can be updated.

albertvaka · 2020-12-18T13:09:37Z

Not sure if anyone else is experiencing this, but we are still seing a Go1.14+ only, MacOS-only deadlock and high CPU usage caused by preparePanic never returning. Although with the latest fix (fa44af7) it doesn't segfault. As of now I still don't know the cause.

dmitshur · 2020-12-18T15:58:29Z

@albertvaka Since this was about a segfault, which is fixed, please open a new issue about a deadlock you're seeing, so it can be tracked separately. In the description please also include a reference to this issue. Thanks.

alexcrichton mentioned this issue May 14, 2020

Segfault on macOS bytecodealliance/wasmtime-go#10

Closed

ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 15, 2020

avidal added a commit to avidal/fastlike that referenced this issue May 20, 2020

Set Go 1.13.10 as the default Go version until macOS segfault is fixed

3a4caf9

See upstream bytecodealliance/wasmtime-go#10 and golang/go#39079

avidal added a commit to avidal/fastlike that referenced this issue May 20, 2020

Set Go 1.13.10 as the default Go version until macOS segfault is fixed

3bf854d

See upstream bytecodealliance/wasmtime-go#10 and golang/go#39079

avidal added a commit to avidal/fastlike that referenced this issue May 20, 2020

Set Go 1.13.10 as the default Go version until macOS segfault is fixed

f10429d

See upstream bytecodealliance/wasmtime-go#10 and golang/go#39079

dmitshur added the OS-Darwin label Jun 7, 2020

ianlancetaylor changed the title ~~cgo: macOS-only segfault on 1.14+ with "split stack overflow"~~ runtime: macOS-only segfault on 1.14+ with "split stack overflow" Jun 11, 2020

ianlancetaylor added this to the Unplanned milestone Jun 11, 2020

gopherbot closed this as completed in d286e61 Jun 15, 2020

dmitshur modified the milestones: Unplanned, Go1.15 Jun 15, 2020

gopherbot mentioned this issue Oct 15, 2020

runtime: macOS-only segfault on 1.14+ with "split stack overflow" [1.14 backport] #41991

Closed

This was referenced Oct 19, 2020

Datadog Agent on macOS crashed with: "Runtime: Split stack overflow" DataDog/datadog-agent#6538

Closed

Datadog Agent stops reporting metrics and consumes 100% CPU on Mac OS DataDog/datadog-agent#6539

Closed

avidal mentioned this issue Nov 10, 2020

Update Go versions verified in CI, clarify 1.14 on macOS warning bytecodealliance/wasmtime-go#40

Merged

alexcrichton mentioned this issue Feb 23, 2021

Cannot longjmp from signal handler to recover from SIG{SEGV,ILL,...} on macOS #44501

Closed

golang locked and limited conversation to collaborators Dec 18, 2021

gopherbot added the FrozenDueToAge label Dec 18, 2021

runtime: macOS-only segfault on 1.14+ with "split stack overflow" #39079

runtime: macOS-only segfault on 1.14+ with "split stack overflow" #39079

Comments

alexcrichton commented May 14, 2020

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ALTree commented May 15, 2020 • edited Loading

mknyszek commented May 15, 2020

alexcrichton commented May 15, 2020

mknyszek commented May 15, 2020

mknyszek commented May 15, 2020

mknyszek commented May 15, 2020

alexcrichton commented May 15, 2020

mknyszek commented May 15, 2020

alexcrichton commented May 15, 2020

mknyszek commented May 15, 2020

avidal commented Jun 11, 2020

networkimprov commented Jun 11, 2020

ianlancetaylor commented Jun 11, 2020

aclements commented Jun 12, 2020

alexcrichton commented Jun 12, 2020

mknyszek commented Jun 13, 2020

aclements commented Jun 15, 2020

alexcrichton commented Jun 15, 2020

bcmills commented Jun 15, 2020

alexcrichton commented Jun 15, 2020

bcmills commented Jun 15, 2020

alexcrichton commented Jun 15, 2020

ianlancetaylor commented Jun 15, 2020

gopherbot commented Jun 15, 2020

cherrymui commented Jun 15, 2020

alexcrichton commented Jun 15, 2020

cherrymui commented Jun 15, 2020

mengzhuo commented Jun 16, 2020

cherrymui commented Jun 16, 2020

albertvaka commented Oct 15, 2020

KSerrania commented Oct 15, 2020

gopherbot commented Oct 15, 2020

gopherbot commented Oct 15, 2020

albertvaka commented Dec 18, 2020 • edited Loading

dmitshur commented Dec 18, 2020

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

ALTree commented May 15, 2020 •

edited

Loading

albertvaka commented Dec 18, 2020 •

edited

Loading