-
Notifications
You must be signed in to change notification settings - Fork 59
Expressivity Gap: concurrent ABA-safe stack (atomics + provenance = pain) #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The |
Yeah you don't have to worry about race conditions, I agree. ABA is still a problem though. Fundamentally the issue is that But without LLSC, all we got is this weak signal that the address didn't change, and that's unfortunately insufficient to avoid ABA issues in the Abstract Machine. The stack is not involved in concurrency shenanigans any more, but it is involved in provenance shenanigans. We can either do something global that has an unbounded effect (hard to reason about, both for program verification and optimizations), or we acknowledge that the stack is still a "strange stack" and indicate that to the compiler each time we access |
I think there is another class of solutions (or at least variations on 1.) which involve making CAS special: When a CAS operation succeeds, "refresh" the provenance on all pointers with the address involved in the CAS (it doesn't matter if it's the source or destination address since they're the same if the CAS succeeded). |
Out of curiosity, does a CAS on CHERI take into account the permissions on the pointer? |
As in, scan the entire memory of everything and update provenance? That's even worse than an intrinsic to "refresh provenance of everything reachable". I don't consider such non-local actions to be acceptable. They also interact poorly with other parts of Rust, e.g. you might be refreshing (as in, changing) the provenance of pointers stored behind shared references, thus violating the immutability of shared references. |
Also with (3) it's not obvious to me why the effect needs to be recursive? Surely the following code is valid if such an let elem = malloc();
elem.next = erase_provenance(STACK_TOP.load(Acquire));
while let Err(new_next) = STACK_TOP.cas(elem.next, elem, AcqRel) {
elem.next = erase_provenance(new_next);
} (Where |
Your |
Ah true, it doesn't have to be recursive since we can apply it to all relevant pointers when putting them into the list.
It's not quite what you proposed, or at least not what I understood you proposed. The idea with this kind of "deeply erased" pointer is that doing a load through this pointer refreshes the provenance of the pointers it loads. So we never need to make a choice that picks a future allocation. However, now each load becomes a possible "expose", which kills this proposal immediately -- that's a too big side-effect for a load. |
True, it's not entirely the same. |
Why would Imagine we had a magic
let elem = malloc();
elem.next = defer_provenance_until_cas(STACK_TOP.load(Acquire));
while let Err(new_next) = STACK_TOP.cas(elem.next, elem, AcqRel) {
elem.next = defer_provenance_until_cas(new_next);
} I think this code would clearly be valid without needing any special treatment for
Perhaps with a small tweak it doesn't need to be an "expose" though: let elem = malloc();
elem.next = erase_provenance(STACK_TOP.load(Acquire));
loop {
match STACK_TOP.cas(elem.next, elem, AcqRel) {
Ok(prev) => {
expose_addr(prev);
break;
}
Err(new_next) => {
elem.next = erase_provenance(new_next);
}
}
} Now the |
I have just realised that this does introduce a race condition if the pointer is loaded before the previous address is exposed, so exposing the address would have to be an atomic part of the CAS operation, rather than something done explicitly as part of the algorithm. I think the above example is still useful to explain what I'm suggesting though. |
On Morello at least, the manual says the capability has to be bitwise identical for CAS to succeed, which I'd assume means the entire 129 bits including address, metadata, and valid bit (tag). In other words, in practice, this kind of stack should 'just work' on CHERI… as long as you use pointer types. Which makes me skeptical of the idea of recommending However, even on CHERI this would be a problem from an Abstract Machine perspective, since even two bitwise-identical capabilities may not have the same abstract provenance. (If I understand correctly, the original example wouldn't work as-is, because a CHERI allocator isn't supposed to reuse memory before tag-GC clears the valid bits of all dangling pointers. But I think you can encounter a similar problem without involving the allocator?) Which makes me also skeptical of solutions that rely on But perhaps the Abstract Machine could add some sort of weaker equivalent of |
I was thinking it's needed in preparation for the next
That's a frighteningly global operation. How is "address was used" even defined? It's an integer, it doesn't have provenance, so we can't determine where this value has flowed. I wonder though... instead of "everywhere it was used", at least for this particular operation we'd actually just have to put it in one place:
This would ensure that right after a successful CAS, the value in STACK_TOP (which did not change) and the value in However I am concerned that this is fixing a special case instead of a larger problem. That said, the fact that this ensures "after CAS the two values are momentarily equal" does seem like more than just a band-aid. |
I was told there are more examples in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1726r5.pdf, but I have not yet taken a closer look. |
This sounds like |
Reading the docs for Judging from the examples, all
|
So turns out maybe this really isn't ad-hoc at all! We debugged a nasty RwLock concurrency issue and tracked it down to an ABA problem very similar to this one. Except that one needs a slightly more powerful version of this "provenance-transferring compare-exchange": the pointer that should acquire the provenance might be different from the one that we are comparing with, since there might be some bitmasking applied. fn compare_exchange_transfer_provenance(
&self,
current: *mut T,
new: *mut T,
provenance_target: *mut *mut T,
success: Ordering,
failure: Ordering
) -> Result<*mut T, *mut T> Would atomically execute: let old = self.read();
if old == current {
// Transfer provenance of old value.
// Doesn't change the actual bytes in memory, so backends can ignore this.
provenance_target.write(old.with_addr(provenance_target.read().addr()));
// Store new value.
self.write(new);
// Done.
Ok(old)
} else {
Err(old)
} Then the push at the top could be written as let elem = malloc();
elem.next = STACK_TOP.load(Acquire);
while let Err(new_next) = STACK_TOP.compare_exchange_transfer_provenance(elem.next, elem, addr_of_mut!(elem.next), ...) {
elem.next = new_next;
} I guess there are some thorny questions about that extra write we are doing. like, how does it show up in the memory model? What if it races with reads/writes to the same location? We probably want to forbid that, even if the other reads/writes are atomic. You can't synchronize via this provenance write. |
I personally am quite scared by introducing an extra write in concurrent code with unclear semantics. I'm not even sure forbidding races is viable, since for example another thread could gain access to the pointer via a successful As an alternative that entirely avoids this extra write, what if we added a function Push: let elem = malloc();
elem.next = future_provenance(STACK_TOP.load(Acquire)); // This line changes
while let Err(new_next) = STACK_TOP.cas(elem.next, elem, AcqRel) {
elem.next = future_provenance(new_next); // As does this
} Similarly, the lockless queue ABA problem can be solved with This should be very easy to implement: before lowering, There are still some funky details to work out about what this means on the abstract machine when the usage isn't quite the straightforward CAS loop (e.g. what if the pointer is accessed before the CAS or passed to multiple CASes or...), but I think those can be forbidden or made okay fairly easily. For a concrete, executable, and deterministically checkable proposal:
I don't love everything about this proposal, but I think it or something like it has promise: it doesn't involve changing code very much, it's very flexible, it doesn't complicate the memory model, and it has clear implementation strategies. |
I am very strongly opposed to a semantics that requires predicting the future if less extreme alternatives exist. What you are describing is called "prophecy variables" in the literature, and they can be useful in verification, but they should at best be a last resort in the specification of a language. One of the reasons why they are so problematic is that they make it entirely impossible to have a tool like Miri. Your notion of what is needed for an "implementation strategy" is clearly quite different from mine. They also need to be carefully constraint or else you end up with "time travel paradoxes", which in this case would surface as an impossibe-to-compile program (namely if the program uses the return value of future_provenance to influence what happens in the compare_exchange -- not sure if that can happen, but the very fact that this provenance can be used at all before the compare_exchange has significant potential for trouble). |
Agreed - preserving linearity as a property is something I'd like (and I'd expect any compiler implementor would as well). As for a paradox, here's one: let mut x = 0;
let mut rx = &mut x as *mut i32;
let mut px = core::ptr::addr_of_mut!(rx);
*px = core::ptr::addr_of_mut!(px).cast();
let mut p = core::ptr::invalid(px.addr()); // Destroy provenance, but not to worry, we'll get it back later.
let fp = core::ptr::future_provenance_mut(p);
let ap = AtomicPtr::new(fp.read());
let _ = ap.compare_exchange(p, &mut x, Ordering::Relaxed, Ordering::Relaxed); // we know this succeeds because p and fp compare bitwise equal ignoring provenance |
I agree that predicting the future and prophecy is dangerous here, but I think it's a much more manageable case than usual because I can't think of an application where you need to actually observe the future provenance until it's no longer in the future. That's why in my concrete semantics I specified that it is undefined behavior to dereference a pointer with lazy provenance before that provenance is initialized; @chorman0773, your example immediately hits that undefined behavior. I very much want it to be checkable in something like Miri. Perhaps I should have used less time travel-y framing because that is unnecessary; it simply was the direction that I initially approached it from. |
Fair enough. Although, I think the way this will break is with an OOTA set up and a couple more threads.
Still, if you use future_provenance first, then allocate something at a particular address, I think you'll find that any optimizing compiler will assume a new allocation won't ever be pointed to by a pointer that was created before it (even if you prove the addresses are the same). In general, the linearity property preserves the fact that you can't get a pointer to an allocation before that allocation becomes live. |
After reading your comment in more detail, I have to apologize -- my reaction was based on "I don't have much time, but I read the summary and that sounds like a direction I am not happy with". I entirely missed the part of your post where you actually give an operational semantics that realizes the idea (and that is not fully equivalent to actual prophecy). I shouldn't have posted before reading your entire comment. Okay, so... this introduces a new kind of provenance, and a new kind of global state related to that. But from what I can tell it is entirely agnostic over the underlying provenance model, and in particular it does not require any notion of "union provenance" or so -- it should be compatible with Stacked Borrows, Tree Borrows, everything. That's neat. The One new aspect of this is that CAS now also mutates this new global state, the lazy provenance table. So technically lowering this to LLVM requires CAS to also be made an opaque operation, or else LLVM might reorder it with pointer derefs that are guaranteed not to alias the CAS... except unfortunately the deref depends on the lazy provenance table state and so moving it up across the CAS makes it UB. So, I am not convinced yet that this will all work out. OTOH my proposal also needs LLVM changes so it's not clear either proposal has a clear lead when it comes to "how well can this be compiled to LLVM". It is amusing how close this is getting to @Darksonn's original question that started this thread -- "can an int2ptr cast grab a provenance that is exposed in the future?" |
I half-suspect that you can argue that the lazy provenances are acyclic by induction over one of the acyclic orderings of the memory model (assuming OOTA can be excluded by some TBD mechanism, else the whole memory model is sunk because you can forge private provenances), but it's not really necessary to resolve that issue to avoid this particular infinite loop: a basic cycle detection here (keep a stack of visited nodes and fail if you revisit one) will correctly and completely detect this issue (and this can't run forever because you can only have finitely many provenances at any given point in the program). Presumably dereferencing a lazy provenance cycle is also UB. |
Alice Ryhl brings up the following data structure: a concurrent stack where all elements can only be popped at once.
Stack declaration:
Push:
Pop all:
This can run into ABA situations in push if, between the
load
and thecas
, the stack gets popped, freed, new elements get pushed, and they re-use the same address. ThenSTACK_TOP
will contain anElem
whosenext
points to the right element but still uses the provenance of the old element. Adding any code between theload
and thecas
cannot fix this since the new element might bet allocated after all of that code runs, andfrom_exposed_addr
cannot "grab" a provenance that does not even exist yet.IMO what should be done here is that
next
should be declared to have typeusize
, and one just passes the pointers through exposed_addr/from_exposed_addr when they enter/leave the stack. However some people seem to consider it very important that the stack uses a normal pointer type and can be used in the rest of the code without being aware that there are concurrency shenanigans going on. I am not sure why that is so important, given in particular that I think this will at best be an utter nightmare to specify, but here are some options I have seen or thought of:from_exposed_addr
actually works here. For instance, rather than angelically guessing the provenance whenfrom_exposed_addr
get called, we could generate a fresh symbolic provenance and gather constraints on how that provenance is used, and then raise UB when the constraints become unsatisfiable. (This is basically the "udi" part of PNVI-ae-udi.) Unfortunately I have no great idea how to do this with a complex aliasing model.|p| from_exposed_addr(p.exposed_addr())
, and the loaded provenance also recursively obtains this effect. However, now loads through such pointers cannot be DCE'd, since they have the side-effect of exposing some provenance, so this doesn't seem practical either.So to me (2) is the only realistic option I have seen so far -- which conveniently would also get rid of angelic choice again, if we can make it work...
The text was updated successfully, but these errors were encountered: