Skip to content

Async Process IO hangs 100% of the time on Linux #5197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danpalmer opened this issue Mar 31, 2025 · 6 comments
Open

Async Process IO hangs 100% of the time on Linux #5197

danpalmer opened this issue Mar 31, 2025 · 6 comments

Comments

@danpalmer
Copy link

Summary

I have a wrapper around Process designed to asynchronously run and read stdout/stderr in order to allow for the following API:

let (stdout, stderr) = try await run("/bin/ls", ["."])

Under macOS (15.x) this works, but under Linux, specifically using the swift:6 Docker image, this hangs indefinitely.

Detailed issue

Under some running conditions it's possible to observe the following error:

*** Could not create wakeup socket pair for CFSocket!!!

Additionally, although not my concern it may help debug, the same code crashes with a bad pointer dereference in libswift_Concurrency.so on Swift 5.9.

strace'ing this suggests that the subprocess does indeed run and complete, so indicates an issue with the Swift run loop.

One possibility is that there's a file handle limit preventing the I/O from happening, however ulimit suggests that this container has no limits that would be causing an issue.

Given that the same code works on macOS and does not on Linux, suggests to me a bug in the framework. While I'm conscious that it's very easy to introduce races in code like this (and I have fixed several in this area), a difference in behaviour like this should be impossible, or at least documented and obvious.

Reproduction

I've create a relatively minimal reproduction, with code similar to what's in my project, available here: https://github.com/danpalmer/swift-linux-issue 1.

# Works as expected
$ swift run

# Hangs
$ docker run --rm -v $(pwd):/code -w /code swift:6.0 swift run

To run this you'll need to install a Docker provider on macOS such as OrbStack.

Happy to answer any questions or provide further follow-up. This area is not something I know a lot about – I would consider myself an intermediate Swift developer and a relatively basic Linux user, so it's quite possible that I've missed obvious things that I should go back and try before further investigation occurs.

Footnotes

  1. Please note that this was AI generated, but a) appears to reproduce the issue correctly working as expected on macOS and hanging in the same way on Linux, and b) is highly representative of both the current code I have in my project and an open source library that both exhibit the same issue in the same way.

@danpalmer
Copy link
Author

I've updated the reproduction a bit...

  • The existing reproduction is simplified a bit more, we don't need stderr to exhibit the issue so I've dropped that bit to shorten the code. I'm not convinced of the general correctness of the reproduction, but it at least works on macOS in its current form and doesn't on Linux.
  • But to head-off implementation issues, I've also added a second reproduction using tuist/Command, which has been somewhat battle tested at this point and has solved a number of issues, although notably this repo currently disables all of its tests on Linux because they don't work (seems to be the same Swift 5.9 issue mentioned above).

@t089
Copy link

t089 commented Apr 1, 2025

I can confirm, that I have also noticed unpredictable hangs when using the readabilityHandler API on linux.

This is might be a duplicate of: #3275 (which would make it less unpredictable)

Instead, I have now switched to reading from the file handlers more directly, see below. Note that this will buffer all output in memory.

let outputData : Mutex<Data?> = Mutex(nil)
let errorData : Mutex<Data?> = Mutex(nil)

let group = DispatchGroup()

let task = Foundation.Process()
task.executableURL = executable
task.arguments = Array(arguments.dropFirst())
let outputPipe = Pipe()
let errorPipe = Pipe()


task.standardOutput = outputPipe
task.standardError = errorPipe

try task.run()
group.enter()
Thread { // read full output in a separate thread
    let data = try? outputPipe.fileHandleForReading.readToEnd()
    outputData.withLock { 
        $0 = data
    } 
    group.leave()
}.start()

group.enter()
Thread {  // read full error output in a separate thread
    let data = try? errorPipe.fileHandleForReading.readToEnd()
    errorData.withLock {
        $0 = data
    }
    group.leave()
}.start()


task.waitUntilExit()

// wait until the reader threads complete
if .timedOut == group.wait(timeout: .now() + .seconds(10)) {
    fatalError("Task exited, but timed out waiting for output.")
}

guard task.terminationStatus == 0 else {
    let message = errorData.withLock { $0.flatMap { String(data:  $0, encoding: .utf8)?.trimmingCharacters(in: .whitespacesAndNewlines) } } ?? ""
    throw ShellError.failure(terminationStatus: Int(task.terminationStatus), errorMessage: message, arguments: arguments)
}

return outputData.withLock { $0 }

@danpalmer
Copy link
Author

I've tried this implementation but ran into issues with DispatchGroup.wait not being usable from async contexts, so I tried rewriting this to use TaskGroup instead. This at least retains the readToEnd usage that avoids the use of readability handlers.

https://gist.github.com/danpalmer/762c1cf28fe8ea3d9013c4b73492bc36

This still exhibits the same issue, hanging, and also printing *** Could not create wakeup socket pair for CFSocket!!!.


I think there might actually be two issues here:

  1. The readability handler not being called at the end of the read, breaking my reproduction case.
  2. An issue with propagation of the SIGCHLD signal on the sub process, causing the CFSocket error log.

I'd rather focus on the second here, as it's the blocking issue for me – so far I've not managed to find an implementation that successfully passes my project's test suite under Linux. Interestingly however, it does sometimes work.

One of the tests goes like this:

  1. Run /bin/which git, and read the output.
  2. Make a temporary directory.
  3. Run git init in that directory, using the git found in step 1.
  4. ...write some files and assert some stuff, unimportant.

The new detail I've figured out is that step 1 works, so which git runs successfully, exits, and the test continues. It's step 3 when we run git init that it fails.

I've strace'd step 3, and here's what I've got: https://gist.github.com/danpalmer/f0caed5cd673013787ac142067f1f659 - this shows:

  1. The git binary starting with execve
  2. git running its init process
  3. git printing stdout and stderr
  4. git exiting successfully
  5. The main process reading stdout and stderr successfully
  6. The main process closing the stdout and stderr handles
  7. The main process receiving a SIGCHLD
  8. The main process going into a suspended state with rt_sigsuspend and never coming out

I'm still investigating, but thought I'd give an update.

@danpalmer
Copy link
Author

I've re-run the test, but changing the process to run /bin/which git twice before running git init, and it now fails on the second invocation of which, suggesting this is nothing to do with git specifically, and it's just the second instance that has an issue.

Seems to me like there's some state in the process exit handling that isn't getting cleaned up.

@t089
Copy link

t089 commented Apr 2, 2025

I've tried this implementation but ran into issues with DispatchGroup.wait not being usable from async contexts, so I tried rewriting this to use TaskGroup instead. This at least retains the readToEnd usage that avoids the use of readability handlers.

Sidenote, I think I you should not run readToEnd or waitUntilExit from async context at all because those calls are blocking. To safely use my impl from async context you could wrap it in another async queue and use a continuation to bridge into async land.

@danpalmer
Copy link
Author

Makes sense, at this point I'm treating the above implementation as a simple test case for the "wakeup socket pair" issue, because mature libraries like SwiftCommand and tuist/Command exhibit the same issue, so for production code I'd just use one of those. I don't think calling waitUntilExit/readToEnd in this way is causing the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants