Skip to content

Feature request: Preventing dropping fields of a dropped struct for better C++ interoperability #3006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Volker-Weissmann opened this issue Oct 18, 2020 · 16 comments

Comments

@Volker-Weissmann
Copy link

I'm currently working on bindgen. The goal is to let rust code use C++ libraries as nicely as possible. There is one crucial element missing in Rust to achieve this:

Quote from the docs:

After drop is run, Rust will recursively try to drop all of the fields of self.

There is no stable way to prevent this behavior in Rust 1.0.

We need a way to prevent this and here is why:

Let's say your library looks something like this:

class Inner {
    public:
    ~Inner(){}
};
class Outer {
    public:
    Inner i;
    ~Outer(){}
};

Using godbolt, we find that gcc compiles this to

Outer::~Outer() [base object destructor]:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    Inner::~Inner() [complete object destructor]
        nop
        leave
        ret

The important part here is that calling ~Outer() calls ~Inner() and afaik there is no standard conform way to change this.

If we now want to use this library from Rust the naive way to do it would be

#[repr(C)]
struct Inner {	
}
#[repr(C)]
struct Outer {
	invar: Inner,
}
impl Drop for Inner {
	fn drop(&mut self) {
		extern_c_destructor_inner(&mut self);
	}
}
impl Drop for Outer {
	fn drop(&mut self) {
		extern_c_destructor_outer(&mut self);
	}
}

The problem here is that extern_c_destructor_outer(&mut self) calls extern_c_destructor_inner (as we saw in godbolts output), and the recursive field dropping will call extern_c_destructor_inner() again. Calling a destructor twice is UB.

How can we prevent this? One way would be not impl Drop and just call the destructors manually, but I think we all know how incredibly tedious that would be. The second obvious way would be to use Options, similarly to how the nomicon does it, but this does not for us, because we rely on the structure having the same size as the C++ equivalent.

Another way you might suggest is wrapping it in ManuallyDrop:

#[repr(C)]
struct Inner {	
}
#[repr(C)]
struct Outer {
	invar: ManuallyDrop::<Inner>,
}
impl Drop for Inner {
	fn drop(&mut self) {
		extern_c_destructor_inner(&mut self);
	}
}
impl Drop for Outer {
	fn drop(&mut self) {
		extern_c_destructor_outer(&mut self);
	}
}

But if for example, you want to reassign invar, you got a Problem, because this

let mut o = Outer{ invar:ManuallyDrop::new(Inner{})};
o.invar = ManuallyDrop::new(Inner{})

has not only a terrible syntax, but also leaks memory if Inner allocates memory. Everytime you want to assign you would need to drop it manually and this can't be automated nicely, because you cannot overload the assignment operator in Rust. The last possible way would be to not make member variables public and use getters and setters. This would theoretically, but if the goal is to make the Syntax of using a C++ library as nice as possible and as close to the C++ original as possible this is not a good option.

The only good option I could think of is to add a feature to Rustc that lets you disable the recursive dropping. Then you could write:

#[repr(C)]
#[no_field_dropping]
struct Inner {	
}
#[repr(C)]
#[no_field_dropping]
struct Outer {
	invar: Inner,
}
impl Drop for Inner {
	fn drop(&mut self) {
		extern_c_destructor_inner(&mut self);
	}
}
impl Drop for Outer {
	fn drop(&mut self) {
		extern_c_destructor_outer(&mut self);
	}
}

And the #[no_field_dropping] before struct Outer would tell rustc to not drop invar when dropping outer.

I think this should be zero-runtime memory and cycles cost and very limited compile time memory and cycles cost. If I'm not mistaken, Rustc currently generates the calls to all destructors when a variable goes out of scope. If #[no_field_dropping] is set Rustc needs to only generate the call to the destructor of the struct, but not the calls to the destructors of the fields.

@tesuji
Copy link
Contributor

tesuji commented Oct 18, 2020

@rustbot modify labels: -A-mir

@197g
Copy link

197g commented Oct 18, 2020

I don't think move assignment leaking is a good argument because it is incorrect in its own regard. To correctly assign a C++ value you must either invoke a move-assignment or a move-constructor for initialization, either of which might be defaulted or automatically generated (I don't know if there is still a difference in C++20). The Rust assignment allows you to effectively initialize with a memcpy which circumvents this. In C++ such an initialization is only sound for TriviallyCopyable types which requires that the type

Has a trivial non-deleted destructor

In other words, it can't have its destructor do anything.

Thus, exposing a value or &mut of a C++ object that is not trivially copyable is already unsound. And if no mutable reference is exposed then you obviously don't need to worry about assignment leaking anything and the ManuallyDrop wrapping of contained fields is okay.

@Volker-Weissmann
Copy link
Author

I don't think move assignment leaking is a good argument because it is incorrect in its own regard. To correctly assign a C++ value you must either invoke a move-assignment or a move-constructor for initialization, either of which might be defaulted or automatically generated (I don't know if there is still a difference in C++20). The Rust assignment allows you to effectively initialize with a memcpy which circumvents this. In C++ such an initialization is only sound for TriviallyCopyable types

Oh you're right, I forget that C++ can overload the move assignment constructor. But shouldn't
let obj.invar = Inner::new()
result in the destructor of invar being called and a new Object constructed in-place? That would be safe and sound.
Anyway, this might be less efficient, so we still need some way to call the C++ move assignment constructor.

Thus, exposing a value or &mut of a C++ object that is not trivially copyable is already unsound. And if no mutable reference is exposed then you obviously don't need to worry about assignment leaking anything and the ManuallyDrop wrapping of contained fields is okay.

What exactly do you mean, when you say "exposing a value or &mut of a C++ object"?

I'm going to try to write some macro magic to solve this problem by making mymacro!(obj.invar = Inner::new()) call the correct operators. I will come back in a few to to tell you if that works.

@197g
Copy link

197g commented Oct 18, 2020

For assignment it might result in the somewhat correct sequence, although strictly speaking in the C++ sense you would have to insert a call to std::launder if you intend to reuse the same storage after the destructor was called. But critically, if you have two &mut T then in Rust it is safe to core::mem::swap the two values, no user code called of any sorts. The only way for your API to stop me from doing this and bypassing assignment and construction in the process is by making it a safety invariant that I can't ever acquire mutable references in the first place.

Internally of course you can use a macro but then again you wouldn't have to strictly worry about the impact of being able to assign.

@197g
Copy link

197g commented Oct 18, 2020

Oh you're right, I forget that C++ can overload the move assignment constructor.

Even if there is no such overload, any assignment goes through the implicitly defined operator of the compiler. That's pretty crucial in the object model of C++ in the presence of templated code. The memory model assumes that 'objects' have locations and gives a very limited way to change those locations, this assignment being one of them. Isn't that the basis for reasoning to allow them to be elided in RVO?

@Volker-Weissmann
Copy link
Author

Volker-Weissmann commented Oct 18, 2020

For assignment it might result in the somewhat correct sequence, although strictly speaking in the C++ sense you would have to insert a call to std::launder if you intend to reuse the same storage after the destructor was called. But critically, if you have two &mut T then in Rust it is safe to core::mem::swap the two values, no user code called of any sorts. The only way for your API to stop me from doing this and bypassing assignment and construction in the process is by making it a safety invariant that I can't ever acquire mutable references in the first place.

Internally of course you can use a macro but then again you wouldn't have to strictly worry about the impact of being able to assign.

Ok, that means we have problem, a big problem.
In C++ you can have a Class that holds a pointer ptr to itself and a member function that will only work if ptr == this. As long as this Class has overloaded assignment operators that correctly set ptr, this is perfectly safe and sound and there exists real-life C++ code that does this.
But Rustc assumes that it can just move all types by using memmove, at least as long as no references to this object exists.
This conflicts.

The only solution I see for bindgen to not declare a

#[repr(C)]
struct MyClass {
...
}

, but to use heap allocation instead if MyClass has overloaded its assignment operator. If you then use
core::mem::swap, you just swap the pointer to the heap and not the actual data on the heap.
Heap allocations are slow, but I don't see another solution.
If we could somehow trick rustc into thinking a reference on the MyClass object exists this should also work, because rustc will never memmove an object if a reference exists. (And you cannot borrow mutable and do a swap if its already borrowed.) But I have no idea how that would work, so I think we have to do the heap allocation.

If a class did not overload its assignment operator, we can assume that it can be moved using memmove and generate the normal

#[repr(C)]
struct MyClass {
...
}

instead. So we still need the #[no_field_dropping] feature for those.

@197g
Copy link

197g commented Oct 18, 2020

The only solution I see for bindgen to not declare a [ .. ] but to use heap allocation instead if MyClass has overloaded its assignment operator.

Yes, basically. C++ is not a nicely embeddable language. You could use stack-local allocation as well, as in MaybeUninit<UnsafeCell<T>>, for which the caller has to (unsafely) provide if the place is already initialized or not—both cases are UB in C++ if the caller is wrong. (Note the UnsafeCell wrapper so that you can implement a custom swap of two values behind &_ shared references.) But you can never ever have a safe function return MyClass by value or mutable reference to Rust. Fortunately, at least the heap allocated way might be okay.. Although you can't expose the Box either as it allows mutable access. In the worst case, you can pretend that C++ values are only accessible by a trait object since dynamically sized types can not be moved as easily. Or wait for extern type which might be interpreted to be similar.

If a class did not overload its assignment operator, we can assume that it can be moved using memmove and generate the normal.

If it is TriviallyCopyable which is a concept for that reason. Otherwise you still run into super classes having custom moves, standard types with custom moves as compiler magic, and probably some other things.

struct NoCopyOverload {
    std::vector<uint8_t> vector;
    NoCopyOverload& operator=(const NoCopyOverload&) = default;
};

@Volker-Weissmann
Copy link
Author

Are you sure that it's TriviallyCopyable and not trivially relocatable?

@197g
Copy link

197g commented Oct 18, 2020

I do not know of the status of P1144 (whose number I only found after searching for your question) or another such attribute being accepted but as said I may not be up-to-date. In any case, there should be an inference where every trivially copyable type is trivially relocatable and all other types by user annotations. Providing safe access to &mut _ only for former types should (?) be backwards compatible if trivially relocatable is supported at a later date. And I reckon that in the general case of bindgen you will have to deal with the case of non-trivial relocatability anyways.

@Volker-Weissmann
Copy link
Author

Do you know good resources for learning details about what assumptions rust makes about memory?

@scottmcm
Copy link
Member

but also leaks memory if Inner allocates memory

ManuallyDrop is safely DerefMut, so instead of

o.invar = ManuallyDrop::new(Inner{});

you can instead do

*o.invar = Inner{};

which will run Drop on the previous value.

Now that doesn't completely solve "can't be automated nicely", but it's better at least.

@jonas-schievink jonas-schievink transferred this issue from rust-lang/rust Oct 20, 2020
@jonas-schievink
Copy link
Contributor

Hi! Additions or changes to the language need to go through our RFC process. Before an RFC is written, the feature can also be discussed in our internals forum to find other people interested in it.

I've transferred this issue to the RFC repository.

@Diggsey
Copy link
Contributor

Diggsey commented Oct 20, 2020

I think another option would be to create a pseudo-smart-pointer type, Cpp<T>.

If the C++ class is called Foo, then from Rust you would access it as Cpp<Foo>.

The Rust type Foo would itself have no destructors, and its fields would also have no destructors, but the Drop implementation for Cpp<Foo> will call the destructor defined in C++. Cpp<T> would deref to the inner type.

When passing around references, you can just use &Foo directly, but ownership would be passed with Cpp<Foo>.

@Volker-Weissmann
Copy link
Author

I'm currently doing exactly this (see rust-lang/rust-bindgen#1906).
The disadvantage is that you need one additional heap allocation.

@197g
Copy link

197g commented Oct 20, 2020

C++ values are a very special kind values that are always behind Pin<_>. As in, once you have constructed one into a particular memory location, you must destroy it at that location before reusing the memory. This is kind of neat as it explains quite a few properties. Firstly, that is more or less what Box always guaranteeds by owning a unique allocation (see Box::pin being safe). But secondly, it also has the right properties where exposing a &mut _ is not safe while &_ is always safe, but there should be unsafe ways of accessing it. And thirdly, most importantly, it hints that it should be possible to construct them on the stack by using a macro similar to pin_mut. That would also avoid the heap allocation. And lastly, you could properly expose methods as proper methods on the types by having them take Pin<&_> and Pin<&mut _> references as self parameters and additionally make Trivially Relocatable types implement the Unpin trait so that one can safely get a &mut _ for those.

@Volker-Weissmann
Copy link
Author

C++ values are a very special kind values that are always behind Pin<_>. As in, once you have constructed one into a particular memory location, you must destroy it at that location before reusing the memory. This is kind of neat as it explains quite a few properties. Firstly, that is more or less what Box always guaranteeds by owning a unique allocation (see Box::pin being safe). But secondly, it also has the right properties where exposing a &mut _ is not safe while &_ is always safe, but there should be unsafe ways of accessing it. And thirdly, most importantly, it hints that it should be possible to construct them on the stack by using a macro similar to pin_mut. That would also avoid the heap allocation. And lastly, you could properly expose methods as proper methods on the types by having them take Pin<&_> and Pin<&mut _> references as self parameters and additionally make Trivially Relocatable types implement the Unpin trait so that one can safely get a &mut _ for those.

I read about that one, but I still choose alloc and dealloc, because I don't know how how to construct the pinned object. We have to pin the data BEFORE calling the constructor, but before calling the constructor we only have uninitialized memory. We could construct a Box::pin::<MaybeUninit> type, but then we cannot convert that to Box::pin:: and we're stuck.

The advantage of alloc and dealloc is that I'm fairly confident that I know what it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants