Skip to content

Crashes observed in OutputFileStream due to EXC_BAD_ACCESS. #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcow opened this issue Nov 15, 2022 · 6 comments · Fixed by #178
Closed

Crashes observed in OutputFileStream due to EXC_BAD_ACCESS. #177

dcow opened this issue Nov 15, 2022 · 6 comments · Fixed by #178
Assignees
Labels

Comments

@dcow
Copy link

dcow commented Nov 15, 2022

Recently we've noticed crashes coming from the new OutputFileStream code. An example stack trace is included at the end of this issue.

While digging into this, we noticed the switch from the foundation FileHandle API to the older POSIX API. The rationale presumably is the deprecation of the write(data: Data) function. This left us perplexed and we believe this to be a misunderstanding. The specific write(data: Data) function was deprecated, yes, but it was replaced with the generic write<T>(contentsOf: T) throws where T : DataProtocol function. As well as being more flexible in its input parameter, the new function also properly communicates errors via throws (related: unhandled exceptions at the write(data: Data) call site was previously an issue we saw coming from your code). It is a welcome replacement (:

If I may, it seems the best course of action would be to remove the OutputFileStream addition, specifically, and simply update the code as it was previously so that it uses the non-deprecated write function from FileHandle. And, the new API forces you to handle the errors I mentioned we had also been encountering, 2 birds...

But, if you must stick with the POSIX API, which, while fun to use at times, I would strongly discourage, then the implementation needs some work. There are numerous issues (lack of handling for short writes, unnecessary string conversions, generally thread safety, dubious pointer handling, etc.) which I'm happy to help you through if you decide to stick with this route. But, it's super tricky to get all this right when using the POSIX API hence why the first class Apple-supported Foundation APIs are preferred over the POSIX ones.

Hardware Model:     iPhone14,3
Process:            Uno
Identifier:         app.Uno
Version:            1.0
Role:               Background
OS Version:         iOS 16.0
Exception Type:     EXC_BAD_ACCESS 
Exception Subtype:  KERN_INVALID_ADDRESS


EXC_BAD_ACCESS: Attempted to dereference garbage pointer 0x68.

0  libsystem_c.dylib _flockfile
1  libsystem_c.dylib _fwrite
2  Uno               closure #1 in OutputFileStream.write(_:) (OutputFileStream.swift:69:17)
3  Uno               specialized String.UTF8View.withContiguousStorageIfAvailable<A>(_:) (<compiler-generated>)
4  Uno               OutputFileStream.write(_:) (OutputFileStream.swift:67:29)
5  Uno               closure #1 in Storage.write<A>(_:value:) (Storage.swift:42:26)
6  Uno               thunk for @callee_guaranteed () -> () (<compiler-generated>)
7  Uno               thunk for @escaping @callee_guaranteed () -> () (<compiler-generated>)
8  libdispatch.dylib __dispatch_client_callout
9  libdispatch.dylib __dispatch_lane_barrier_sync_invoke_and_complete
10 Uno               Storage.write<A>(_:value:) (Storage.swift:37:19)
11 Uno               SegmentDestination.queueEvent<A>(event:) (SegmentDestination.swift:126:17)
12 Uno               SegmentDestination.execute<A>(event:) (SegmentDestination.swift:106:13)
13 Uno               specialized closure #1 in Mediator.execute<A>(event:) (Timeline.swift:78:32)
14 Uno               specialized Timeline.process<A>(incomingEvent:) (Timeline.swift:37:13)
15 Uno               closure #1 in StartupQueue.replayEvents() (StartupQueue.swift:60:28)
16 Uno               thunk for @callee_guaranteed () -> () (<compiler-generated>)
17 Uno               thunk for @escaping @callee_guaranteed () -> () (<compiler-generated>)
18 libdispatch.dylib __dispatch_client_callout
19 libdispatch.dylib __dispatch_lane_barrier_sync_invoke_and_complete
20 Uno               StartupQueue.replayEvents() (StartupQueue.swift:58:19)
21 Uno               closure #1 in StartupQueue.analytics.didset (StartupQueue.swift:21:23)
22 Uno               thunk for @escaping @callee_guaranteed (@in_guaranteed A) -> () (<compiler-generated>)
23 Uno               thunk for @escaping @callee_guaranteed (@in_guaranteed A) -> (@out ()) (<compiler-generated>)
24 Uno               closure #1 in Store.notify<A>(subscribers:state:) (Store.swift:250:21)
25 Uno               thunk for @escaping @callee_guaranteed () -> () (<compiler-generated>)
26 libdispatch.dylib __dispatch_call_block_and_release
27 libdispatch.dylib __dispatch_client_callout
28 libdispatch.dylib __dispatch_main_queue_drain
29 libdispatch.dylib __dispatch_main_queue_callback_4CF
30 CoreFoundation    ___CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__
31 CoreFoundation    ___CFRunLoopRun
32 CoreFoundation    _CFRunLoopRunSpecific
33 GraphicsServices  _GSEventRunModal
34 UIKitCore         -[UIApplication _run]
35 UIKitCore         _UIApplicationMain
36 SwiftUI           0x18a82fce4 (0x18a82fc40 + 164)
37 SwiftUI           0x18a78cc20 (0x18a78cb8c + 148)
38 SwiftUI           0x18a775b40 (0x18a775ac4 + 124)
39 Uno               static App.$main() (App.swift:74:1)
40 dyld              start
@bsneed
Copy link
Contributor

bsneed commented Nov 15, 2022

Hi @dcow, the posix API wasn't my first choice, as a matter of fact I didn't find it fun to use at all, haha. The file handle API seems to be changing with every release and I was attempting to shield us from that. I understand regarding some of the issues you point out. Thread synchronization in this case is handled upstream and not sure on the specifics about dubious pointer handling.

I had another version of OutputFileStream that used the FileHandle API and ran both through pretty rigorous tests (I had thought). Hammering 10k events each pass, multiple threads writing, etc. Performance numbers showed them to be pretty much identical.

As some other pieces were introduced around error handling, etc. I can't just revert the change. If you want to do a PR for OutputFileStream and switch it to use the FileHandle APIs, happy to check that out. I'm definitely not married to the current implementation of that class. If you don't have the will or time, I'll post a PR here in a few days and get your thoughts on it.

@dcow
Copy link
Author

dcow commented Nov 15, 2022

Ignoring the details of which API you're using, we're seeing errors coming from the Segment library. Since we're customers of Segment, this needs to be addressed by your team first and foremost in whatever matter you see fit. In fact, we just saw another one of these errors come in last night. It's affected 4 of our users this past month.

The rest was just advice as to what might be causing the errors from our vantage point, take it or leave it. Unfortunately I don't have the cycles right now to throw up a PR, though I wish I could help more.

Few quick notes on the posix api:

  1. You don't need to convert data to a string, just pass the the data pointer as an array directly to the C call, like:
let bytes = [UInt8](data)
fwrite(bytes, MemoryLayout<UInt8>.stride, bytes.count, filePointer)

This allocates contiguous storage for data. The foundation implementation is a little more complicated and elides this step by iterating over the contiguous data regions and writing them one by one. If you're concerned about performance, this might be something to consider adding to your implementation as well.

  1. You need to call posix fwrite in a loop (possibly in a double loop like foundation does, linked below) and check the value returned in order to ensure that all the data has been written, like (this is pseudo code that may compile, but has not been tested and is probably incomplete):
var actual = 0
let requested = data.count
while actual < requested {
    let bytes = data[actual ..< requested]
    let n = fwrite([bytes], MemoryLayout<UInt8>.stride, bytes.count, filePointer)
    actual += n
    if n < bytes.count {
        if feof(filePointer) {
            // break or throw depending on what you want to tell the caller about EOF
        } 
        if ferror(filePointer) {
            // recover if possible (continue) or throw
        }
        // else naturally continue
    }
}

Which errors are recoverable vs fatal is not something I have time to fully elaborate on right now. Usually something like EINTR (process was interrupted) is recoverable, possibly EIO, but this is where it gets complicated. Anyway, the foundation implementation uses write (which behaves a small bit differently than fwrite) and treats EINTR as recoverable. Seems like a good strategy.

  1. Now that you're writing in a loop, while individual calls to fwrite are threadsafe, you must lock on your side until the entire data has been written otherwise you might get interleaving output. Your single queue strategy probably covers this, but it's a nuance to be aware of.

And a quick note on the general changes that were merged, irrespective of the posix stuff:

  1. While you point out that the caller handles synchronization so that write ordering is maintained, nothing prevents write(Data) throws on FileOutputStream from being called after the file handle has been closed. This may be the source of the (what look like) memory corruption errors that we're seeing. (This was the dubious pointer handling I mentioned.) This is only speculation, I have no idea what the actual issue is. Since compilers are weird, I also don't know if it's strictly correct to set a C pointer to Swift nil and even if so, I do not know how fwrite behaves when passed a nil stream pointer.

Anyway, as you can see this stuff is messy. We use the foundation FileHandle apis too in our product and haven't experienced the deprecation thrashing you have, but maybe we're just lucky. We understand you have other important things to do as well. But at least as consumers of analytics-swift, there appears to be a regression causing crashes for our users. Looking forward to the fix!

@dcow
Copy link
Author

dcow commented Nov 15, 2022

I do not know how fwrite behaves when passed a nil stream pointer.

Actually, I just tried it in a playground.

let greeting = "Hello, playground"
let bytes = Array(greeting.data(using: .utf8)!)
fwrite(bytes, MemoryLayout<UInt8>.stride, bytes.count, nil)

Which yields our error:

error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=1, address=0x68).

@dcow
Copy link
Author

dcow commented Nov 28, 2022

@bsneed bump. We're seeing an increasing number of these crashes (as more of our users update).

bsneed added a commit that referenced this issue Dec 1, 2022
* Switched OutputFileStream to use FileHandle

* updated Storage for OutputFileStream changes

* Added unableToClose handler

* Fixed test issue w/ parallelized tests.

* Removed dead code

* Flush memory contents to file before closing.

* Fix lack of version callouts.

* Added version checks for sync/close
@dcow
Copy link
Author

dcow commented Dec 1, 2022

@bsneed thank you! We'll update and report back.

@bsneed
Copy link
Contributor

bsneed commented Dec 1, 2022

@dcow thanks so much! Fingers crossed it just goes away now. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants