-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: "fatal: morestack on g0" on amd64 after upgrade to Go 1.21 #62440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it was mentioned here #62130 (comment) |
Thanks for the reference @seankhliao. This sounds like the same issue mentioned in that comment, but overall unrelated to #62130. There is no PHP involved here as far as I can tell. Fluent Bit doesn't seem like the kind of project that would be doing fancy stack manipulation. @VirrageS or @PettitWesley, do you know if Fluent Bit has some sort of non-standard behavior we should know about? |
It's in there but I don't know if it's relevant for this particular instance. |
It looks like there is stack manipulation going on. The stack pointer changes quite a bit between calls:
And we can see that the top frame in the stack is something called So this is effectively the same issue as #62130, though from a completely different library. |
I'm not really sure :(
Actually, there is something like https://github.com/fluent/fluent-bit/blob/f7731f3f81b1b32eef36aaa6786d82593d35a841/lib/flb_libco/sjlj.c#L99. I have bisected the problem to ef0dedc. There was already issue with this commit #59678. Maybe there is still something lingering there? Or maybe this commit just uncovered some bug in how fluent-bit works with the stack. |
Thanks for the quick bisect, I was about to suggest ef0dedc (https://go.dev/cl/495855) as the reason this case is suddenly causing problems, even though this use case was already unsupported. Suppose that these calls always occur on a C-created thread (seems very likely given the Go code is a shared object). Prior to CL 495855, every cgocallback would go to needm and fetch a new M, which it initializes with the current stack. Upon return, the M is dropped and the stack can be freely changed without causing issues for the next call. After CL 495855, the M is initialized on the first callback with the stack at that time. Upon the next call, the M has been cached and the stack bounds are not updated, so if the stack has been changed we will trip over stack bounds checks. cc @cherrymui @ianlancetaylor @mknyszek @golang/runtime Regarding #62130 (comment), it isn't immediately obvious to me why this situation is unworkable. Provided that this is a callback from C-to-Go (with no prior Go components up the stack), I don't see why we should care that the stack bounds of the thread have changed. It seems OK to me that such a callback could re-initialize the stack bounds if they have changed since the last call. |
As an aside, CL 495855 gets the stack bounds from pthread, but this coroutine library doesn't appear to do anything to update pthread's view, so those bounds won't be accurate at all. In the case of this application, the first call appears to not be on a coroutine, so it works OK. It is only later calls that move to a coroutine. |
Change https://go.dev/cl/525455 mentions this issue: |
@prattmic Fluent Bit uses simple coroutines that pause and resume, that's all: https://github.com/fluent/fluent-bit/blob/master/DEVELOPER_GUIDE.md#concurrency https://github.com/fluent/fluent-bit/blob/master/include/fluent-bit/flb_coro.h I'm not an expert on it unfortunately. @matthewfala can help answer questions. If you can give me specific questions, I can go find the answers from the other maintainers. |
@prattmic @PettitWesley I'm not too sure how the GO plugins work in Fluent Bit, however for the c plugins, there is "fancy stack manipulation" to get the coroutines to work. On one thread, Fluent Bit swaps out multiple tasks/coroutines to allow for non-blocking async networking with a javascript like event loop. You can see the assembly for the swap function on x86 used below: https://github.com/fluent/fluent-bit/blob/master/lib/flb_libco/amd64.c#L88-L105 There are similar files for other platforms. |
For #62130 (comment), I think it is more about what we (promise to) support. For cgo in general, we probably never really thought about what we support or not if C code switches stacks. If the C code switches stacks while there is some Go frames on the stack, it may be tricky/difficult. If there is no Go frame at all when the C code switches stacks, it is probably fine. We could set a different g0 stack bounds each time (probably based on the current SP, which would be imprecise, but probably fine). How big are C coroutine stacks? If they are small (like goroutine stacks), there is a risk of overflow, as the Go runtime generally assumes C stacks are large. |
Stack size per coroutine is configurable in Fluent Bit: https://github.com/fluent/fluent-bit/blob/master/src/flb_config.c#L152C6-L152C34 Fluent Bit Documentation on setting coro stack size Stack size depends on platform/os. By default size is: Calculated by
Please see: |
@VirrageS could you verify that your application works properly with https://go.dev/cl/525455? (If you aren't familiar with building the toolchain from a CL, you can use |
@prattmic just wanted to say that the gotip version fixed the same issue in our project (tested on Linux, 1.21 on Windows didn't have this problem). Thanks. |
@prattmic sorry for late response. I have just tested the CL and works perfectly! In case someone else wants to confirm, I have added branch to my repro repo that downloads the CL, builds plugin using |
Change https://go.dev/cl/527317 mentions this issue: |
Change https://go.dev/cl/527316 mentions this issue: |
This reverts CL 527056. CL 525455 breaks darwin, alpine, and android. This CL must be reverted in order to revert that CL. For #62440. Change-Id: I4e1b16e384b475a605e0214ca36c918d50faa22c Reviewed-on: https://go-review.googlesource.com/c/go/+/527316 Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Michael Pratt <[email protected]> Reviewed-by: Bryan Mills <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
… C thread" This reverts CL 525455. The test fails to build on darwin, alpine, and android. For #62440. Change-Id: I39c6b1e16499bd61e0f166de6c6efe7a07961e62 Reviewed-on: https://go-review.googlesource.com/c/go/+/527317 Auto-Submit: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
Yes it runs linux-arm64 VM on Mac, but in production we have this problem on real linux/amd64 hosts. |
@prattmic thank you very much for the investigation. For this fast repro I set By default it's ~200KiB and my first successful repro attempt (on the original repro https://github.com/VirrageS/fluent-bit-go-bug)was with this number. I'm not sure how BTW, as I understand we use the default value ~200KiB
This is a good question, let's ask @shpprfb as he knows more than I about fluent-bit. We can also build our Shall we do this? |
I've run it on our normal workload now built with 1.22.2 and there really seems to be no change. @podtserkovskiy Heres an example backtrace: Stack
Now that there is a somewhat clear repro again, does the go team have everything it needs for reproducing, causing a crash, attaching a debugger, etc? |
I also tried adjusting I am curious to see what the C frames prior to the Go call look like on the crashing case where there is barely any stack space remaining. Unfortunately, I've been unable to reproduce inside of debugger thus far. If anyone else would like attempt this, that would be great. You can set a breakpoint on @podtserkovskiy I will clean up and mail the patch I am using shortly. It should be usable to at least verify you are seeing the same symptoms as me in production. |
I believe I found the bug, which is in Go. I was thrown off by the pthread stack bound lookup we do, it turns out that isn't relevant. The coroutine stack allocation is at https://github.com/fluent/fluent-bit/blob/v2.1.8/lib/flb_libco/amd64.c#L142-L147. It is just So when we lookup the pthread stack bounds, we will fail the bounds check at https://cs.opensource.google/go/go/+/master:src/runtime/cgocall.go;l=273-279;drc=a878d3dfa0f9d7cd1de26e3df9eb3983a9f64b53;bpv=0;bpt=1 and avoid using the pthread bounds. Instead we will use the dummy bounds computed above (assume we're near the top of a ~32KB stack). But if we assume a 32KB stack, why does it look like we're called with almost no space? The problem is the fast path at the start of stack update: https://cs.opensource.google/go/go/+/master:src/runtime/cgocall.go;l=224-227;drc=a878d3dfa0f9d7cd1de26e3df9eb3983a9f64b53;bpv=0;bpt=1 Consider this sequence:
We're now using a completely different stack allocation, but with the old stack bounds. In the old stack bounds we only appear to have I selected arbitrary numbers here; the overlap could of course be even smaller. It also does not depend on the fluent-bit stack being <32KB. The same could occur with larger stack sizes provided that the old stack is freed prior to subsequent, allowing the new stack to get allocated in a partially overlapping region. I believe the right approach here is to avoid this fast path and always attempt to update the stack bounds. I will send a CL. |
Change https://go.dev/cl/584597 mentions this issue: |
@prattmic Nice, I was following a similar path from your stacktrace: I see theres a stack switch from g to g0 in goroutine 18:
Which runs on the g stack, invokes the stack switch thunk, which results in
being invoked on the g0 stack. If the g0 stack range is [0x7f0518777690 0x7f051877fa90] there seems to be a strange discrepancy where the g0 stack SP we switch to that is used by the runtime.systemstack routine is 0x7f0518778050. This is only leaving When looking at the stack switching thunk https://github.com/golang/go/blob/master/src/runtime/asm_amd64.s#L513 why is the saved SP ( Is it expected behavior to switch right to the end of the g0 stack allocation for calls that run on g0? This leads me to think this is a pure Go issue since this isn't running on the FluentBit stack at exception time at all. FluentBit should have no bearing on the g0 stack size or what SP is used when switching to g0... right? Does the g0 stack need to grow, or is the SP we enter with on the g0 stack just wrong? |
Probably this. This is because a stack switching function like
I'm not sure I understand the question. Generally, it is expected that we switch to the g0 stack wherever it is currently at. Usually (especially for a pure Go binary), there aren't many frames on the g0 stack, so we switch to somewhere very close to the top. But if Go is called e.g. from a very deep C stack, we cannot overwrite the C frames, so we switch to right below the C frames.
g0 stack doesn't grow. We just check its bounds. So For this particular bug, it is not the the SP is wrong, but we get the g0 stack bounds wrong, so it thinks it goes out of bounds. |
Interesting, that makes sense. I didn't realize the g0 stack location was the original stack that called into Go in the first place which is the FluentBit coroutine stack. Thank you for clarifying. |
@gopherbot Please backport to 1.21 and 1.22. This bug causes random crashes on cgocallback. |
Change https://go.dev/cl/585337 mentions this issue: |
Change https://go.dev/cl/585935 mentions this issue: |
Have fluentbit running for 3+ days now with this patch and see no issues. Looks like the fix was good. Thank you! |
Thank you very much for the fix @prattmic. It seems one of checks timed out on https://go.dev/cl/585935, it probably needs some manual intervention 😀 |
…lback callbackUpdateSystemStack contains a fast path to exit early without update if SP is already within the g0.stack bounds. This is not safe, as a subsequent call may have new stack bounds that only partially overlap the old stack bounds. In this case it is possible to see an SP that is in the old stack bounds, but very close to the bottom of the bounds due to the partial overlap. In that case we're very likely to "run out" of space on the system stack. We only need to do this on extra Ms, as normal Ms have precise bounds defined when we allocated the stack. TSAN annotations are added to x_cgo_getstackbounds because bounds is a pointer into the Go stack. The stack can be reused when an old thread exits and a new thread starts, but TSAN can't see the synchronization there. This isn't a new case, but we are now calling more often. For #62440. Fixes #67297. Cq-Include-Trybots: luci.golang.try:go1.21-linux-amd64-longtest Change-Id: I5389050494987b7668d0b317fb92f85e61d798ac Reviewed-on: https://go-review.googlesource.com/c/go/+/584597 Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit 1ffc296) Reviewed-on: https://go-review.googlesource.com/c/go/+/585337
…lback callbackUpdateSystemStack contains a fast path to exit early without update if SP is already within the g0.stack bounds. This is not safe, as a subsequent call may have new stack bounds that only partially overlap the old stack bounds. In this case it is possible to see an SP that is in the old stack bounds, but very close to the bottom of the bounds due to the partial overlap. In that case we're very likely to "run out" of space on the system stack. We only need to do this on extra Ms, as normal Ms have precise bounds defined when we allocated the stack. TSAN annotations are added to x_cgo_getstackbounds because bounds is a pointer into the Go stack. The stack can be reused when an old thread exits and a new thread starts, but TSAN can't see the synchronization there. This isn't a new case, but we are now calling more often. For #62440. Fixes #67298. Cq-Include-Trybots: luci.golang.try:go1.22-linux-amd64-longtest Change-Id: I5389050494987b7668d0b317fb92f85e61d798ac Reviewed-on: https://go-review.googlesource.com/c/go/+/584597 Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit 1ffc296) Reviewed-on: https://go-review.googlesource.com/c/go/+/585935 Run-TryBot: Joedian Reid <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
- Bumps up go version in go.mod - Bumps up builder container image to golang:1.22.5 - Bumps up ci pipeline to fetch golang 1.22.5 Validates fix for golang/go#62440
- Bumps up go version in go.mod - Bumps up builder container image to golang:1.22.5 - Bumps up ci pipeline to fetch golang 1.22.5 Validates fix for golang/go#62440
Does this issue reproduce with the latest release?
Yes.
What did you do?
https://github.com/VirrageS/fluent-bit-go-bug here is full code to repo.
I will try to make it smaller but for now this is all I have.
When running this project with:
it runs correctly.
But when I change to
GO_VERSION=1.21
it fails withfatal: morestack on g0
:It looks like regression in Go 1.21.
I found similar issues but those seem to be quite old and are not tied to Go 1.21 version.
I have tried to
strace
the binary and this is what I got:Details
What did you expect to see?
Same behavior as with
GO_VERSION=1.20
.What did you see instead?
fatal: morestack on g0
The text was updated successfully, but these errors were encountered: