-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: unexpected SIGURG in TestCgoCallbackGC on openbsd-386-62 builder #36996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Marking as release-blocker until we understand the impact. |
OpenBSD is not a first-class port per https://golang.org/wiki/PortingPolicy, so I don't think an occasional failure on OpenBSD counts as a release blocker. But of course it would be nice to understand this. |
Since we only have N=1 test failures, I don't think we have enough evidence to say whether the issue is OpenBSD-specific at this point. (And even with a larger N, sometimes timing or other variations can cause an issue that affects multiple platforms to only manifest on one specific builder.) |
I'm 99% sure it's a problem with the test. We should be blocking signal before creating the new C thread. |
Although that does suggest a possible problem for Go programs that call C code that creates new threads. Hmmm. We aren't seeing problems on GNU/Linux. I may be mistaken. |
I note that on GNU/Linux |
This is strange. The process is dying of a So far I've run the test several thousand times under ktrace on the openbsd-386-62 gomote. I was able to get one failure. Unfortunately, the gomote then crashed before I could look at all the data. The gomote continues to crash periodically, forcing me to rebuild everything before I can do more testing. I'm still trying. |
I think I see the problem. This test case has a C thread that calls into Go code. The scheduler will sometimes decide to preempt that Go code. When that happens, the scheduler will send a However, it possible that by the time the signal is delivered, the thread is no longer executing Go code. In that case, the thread-local 'g' variable will be All those steps can happen on any system. Normally, it causes a bit of extra work but no actual harm, as However, the OpenBSD 6.2 kernel appears to have a race condition. If thread 1 changes the This program shows the problem. It exits with 0 on GNU/Linux, and with 144 ( #include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
/* Die with an error message. */
static int die(const char *s) {
perror(s);
exit(EXIT_FAILURE);
}
/* Signal handler. */
static void handler(int sig, siginfo_t* info, void *ctxt) {
sigset_t set;
struct sigaction sa;
struct timespec ts;
/* Set the signal handler to the default handler. */
memset(&sa, 0, sizeof sa);
sa.sa_handler = SIG_DFL;
if (sigfillset(&sa.sa_mask) < 0) {
die("sigfillset");
}
sa.sa_flags = SA_SIGINFO | SA_ONSTACK | SA_RESTART;
if (sigaction(sig, &sa, NULL) < 0) {
die("sigaction");
}
/* Unblock the signal. */
if (sigemptyset(&set) < 0) {
die("sigemptyset");
}
if (sigaddset(&set, sig) < 0) {
die("sigaddset");
}
if (sigprocmask(SIG_UNBLOCK, &set, NULL) < 0) {
die("sigprocmask");
}
/* Sleep for ten microseconds. */
memset(&ts, 0, sizeof ts);
ts.tv_nsec = 10000;
nanosleep(&ts, NULL);
/* Block the signal. */
if (sigprocmask(SIG_BLOCK, &set, NULL) < 0) {
die("sigprocmask");
}
/* Set the signal handler back to this handler. */
sa.sa_sigaction = handler;
if (sigaction(sig, &sa, NULL) < 0) {
die("sigaction");
}
}
/* Thread that just waits for a signal. */
static void *thread_waiter(void *arg) {
sigset_t mask;
if (sigemptyset(&mask) < 0) {
die("sigemptyset");
}
while (1) {
sigsuspend(&mask);
}
return NULL;
}
/* Thread that sends regular SIGURG signal to the tid passed in arg. */
static void *thread_sender(void *arg) {
pthread_t* ptid;
int i;
ptid = (pthread_t*)arg;
for (i = 0; i < 10000000; i++) {
int err;
err = pthread_kill(ptid[0], SIGURG);
if (err != 0) {
errno = err;
die("pthread_kill");
}
err = pthread_kill(ptid[1], SIGURG);
if (err != 0) {
errno = err;
die("pthread_kill");
}
}
return NULL;
}
int main() {
struct sigaction sa;
pthread_t waiter;
pthread_t sendto[2];
pthread_t sender;
int err;
struct timespec ts;
/* Arrange to set up a signal handler for SIGURG. */
memset(&sa, 0, sizeof sa);
sa.sa_sigaction = handler;
if (sigfillset(&sa.sa_mask) < 0) {
die("sigfillset");
}
sa.sa_flags = SA_SIGINFO | SA_ONSTACK | SA_RESTART;
if (sigaction(SIGURG, &sa, NULL) < 0) {
die("sigaction");
}
/* Start the waiter thread. */
err = pthread_create(&waiter, NULL, thread_waiter, NULL);
if (err != 0) {
errno = err;
die("pthread_create");
}
/* Start the sender thread. */
sendto[0] = pthread_self();
sendto[1] = waiter;
err = pthread_create(&sender, NULL, thread_sender, (void *)(&sendto[0]));
if (err != 0) {
errno = err;
die("pthread_create");
}
/* Run for five seconds. */
memset(&ts, 0, sizeof ts);
ts.tv_sec = 6;
while (ts.tv_sec > 0) {
if (nanosleep(&ts, &ts) == 0) {
break;
}
}
return 0;
} I think the fix is going to be to avoid raising |
cc @mdempsky because for keywords "openbsd kernel" |
Change https://golang.org/cl/217617 mentions this issue: |
2020-01-31T23:32:02-53558cb/openbsd-386-62
Given the
SIGURG
, I'm guessing this is related to the 1.14 preemption changes.Is this a bug in the runtime, or does the test need to be updated to account for the preemption signals?
@aclements @mknyszek @ianlancetaylor
The text was updated successfully, but these errors were encountered: