-
Notifications
You must be signed in to change notification settings - Fork 25
Bitcoin Core rpc_misc.py test failure #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So far I tried on (M4) macOS 15.2 with commit 9a90f964a3c4fe75477926d34b7f1813b144449d: cmake -B build -DWITH_MULTIPROCESS=ON
cmake --build build
BITCOIND=$(pwd)/build/src/bitcoin-node build/test/functional/rpc_misc.py Using multiprocess master @ 3b2617b |
From the CI link above the error seems to be: node0 stderr libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument which according to https://stackoverflow.com/questions/66773247/libcabi-dylib-terminating-with-uncaught-exception-of-type-std-1system-er "typically happens when .lock() is called on a mutex that is not yet constructed, or has already been destructed." In the test, the error happens when the node is shutting down after an echoipc call has been made: |
Another CI failure on the same test, this time on centOS. I'm unsure if it's the same issue: https://cirrus-ci.com/task/6027377219731456?logs=ci#L4105
|
The failure on centos https://cirrus-ci.com/task/6027377219731456?logs=ci#L4111 also reported bitcoin/bitcoin#30975 (comment) is happening in nearly the same place as originally reported failure on macos, when the test tries to stop the node on line https://github.com/bitcoin/bitcoin/blob/11b293a9e7e89f25acc74d0bda5cc68cd164b551/test/functional/rpc_misc.py#L83 after an However, there are some differences between the two cases:
In both cases there seems to be a race condition and the bug could have the same root cause. I think my next step will be to try to run the test in loop to see if I could reproduce locally. |
The "Unexpected stderr" "terminate called after throwing an instance of 'kj::ExceptionImpl" "mp/proxy.cpp:242: disconnected: write(m_post_fd, &buffer, 1): Broken pipe" error that happens on CentOS was actually seen before and has an issue open bitcoin/bitcoin#31151 |
Looking into this more it's not clear if this "mutex lock failed: Invalid argument" issue is the same as the other "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" issue reported in bitcoin/bitcoin#31151, or if the fix in #129 will affect it. The other "Broken pipe" issue happens the when the child process in the echoipc test does not shut down cleanly due to a race condition in the EventLoop shutdown sequence that happens there. But the "mutex lock failed" here is happening in the parent process not the child process, and even though the last log messages are about the event loop shutting down, they could just indicate that the event loop shut down successfully, and the error is happening at some point after that. The echoipc parent process starts its event loop with the But even though the event loop shutdown sequence is different, it could have the race conditions described in bitcoin/bitcoin#31151 as long as there are any The echoipc method in the parent process isn't passing any interface pointers to the child process so it isn't actually creating any ProxyServer objects of its own. However the IPC framework is creating But it's also possible #129 does not fix this, so will probably just need to see if it happens again. |
I'm going to close this issue assuming #129 fixes it, but if it happens again, this can be reopened. |
This issue did turn out to be different than #31151 and not fixed by #129. A similar macos "mutex lock failed" error was reported in #154. Both this issue and #154 should be fixed by bitcoin/bitcoin#31815, but this has not been confirmed yet. |
Happened once (?) while testing bitcoin/bitcoin#30975.
https://github.com/bitcoin/bitcoin/actions/runs/12826202676/job/35765674846?pr=30975#step:7:3078
I haven't been able yet to reproduce locally.
The text was updated successfully, but these errors were encountered: