Remove repr(C) from Mutator #1318

wks · 2025-05-11T11:55:05Z

The reason why the Mutator struct was made #[repr(C)] was to make it possible for bindings to embed Mutator into their own C/C++ structs in thread-local storage. This is supposed to improve performance by allowing inlined fast-path code to access important fields (such as BumpPointer) quickly from the TLS. However, embedding the entire Mutaotr is neither practical nor necessary.

It is impractical because

Because the Mutator struct is so big, it is beyond the addressable range of immediate addressing in some architectures, such as RISC-V. There is already a proposal to make Mutator smaller.
The C/C++ part of the VM binding has to precisely replicate the Mutator struct. This is tedious and error-prone, but both the OpenJDK and the Julia binding have done that. And the presence of Rust-specific data types, such as &dyn T, makes it hard to get the object layout right.
To make it repr(C), the Allocators sub-struct has to rely on fixed-size arrays [T; N]. By doing this, the number of allocators of each kind are limited and unknown at run time.
- Changing the maximum number of each allocator will alter the layout of Mutator, resulting in a series of cascading changes in VM bindings that depend on the precise layout, such as this PR and its associated changes in the OpenJDK and Julia bindings.
- Because the actual number of each kind of allocator is determined by the concrete Plan, the Allocators struct has to use MaybeUninit (like [MaybeUninit<BumpAllocator<VM>>; MAX_BUMP_ALLOCATORS]) so that array elements can remain uninitialized if the concrete plan doesn't need that many allocators of that kind. This is also the reason why the Allocators::get_allocator and Allocators::get_allocator_mut have to be made unsafe.

And it is not necessary to embed the entier Mutator into the TLS. Actually, the allocation fast path only needs the BumpPointer struct. The VM binding only needs to embed the BumpPointer struct. This practice is already documented in the Porting Guide, and there is an API function AllocatorInfo::new(selector) for getting the offset of the BumpPointer struct. The Ruby binding didn't embed the BumpPointer, but just lets the TLS keep a pointer to the BumpPointer inside the Mutator without depending on the concrete structure of Mutator, and the ART binding manually synchronizes the BumpPointer in the Mutator with the thread-local storage.

The proposal: Remove `repr(C)` from `Mutator`

We remove the #[repr(C)] annotation on Mutator. The C/C++ part of the VM binding can no longer assume the layout of Mutator. But we keep a bottom line that:

For every data structure that needs to be accessed in fast paths (such as BumpPointer), we provide an API function for returning the pointer to (or offset of) that structure so that the VM binding can get the pointer without knowing the layout of Mutator. Currently, this can be achieved with either mutator_address + AllocatorInfo::new(selector).bump_pointer_offset or &mutator.allocator_impl<ConcreteAllocator>(selector).bump_pointer.
We allow the related fast-path data structures (such as BumpPointer) to be taken out of the Mutator and put back later. This will allow the VM binding to keep the performance-sensitive parts in the TLS as long as possible because we assume they will be used by the inlined fast paths most of the time.

The `Allocators` struct

The struct Allocators no longer needs to be repr(C), either. It was repr(C) because it was part of Mutator.

Because it doesn't need to be repr(C), its layout no longer needs to be exposed to the VM binding. Now we can use Vec<T> instead of [T; N], for example,

-    pub bump_pointer: [MaybeUninit<BumpAllocator<VM>>; MAX_BUMP_ALLOCATORS],
+    pub bump_pointer: Vec<BumpAllocator<VM>>,

The constants MAX_BUMP_ALLOCATORS, MAX_IMMIX_ALLOCATORS, etc. will be removed. VM bindings no longer needs to ensure it matches the definition in mmtk-core.

When creating a Mutator, we still need to populate the Allocators struct according to the plan-specific SpaceMapping. But this time it can be easier because we can just use Vec::push instead of MaybeUninit::write.

The method Allocator::get_allocator and Allocator::get_allocator_mut can be made safe. It is currently unsafe because they use MaybeUninit::assume_init_ref() and MaybeUninit::assume_init_mut(), and neither of them have the ability to check whether a given MaybeUninit<T> is actually initialized or not. With the list of allocators becoming Vec, we can simply use the Vec::get() and Vec::get_mut() methods which return Option<&T> and Option<&mut T>, respectively. This will reduce up to 15 call sites in allocators and mutator prepare/release functions that involve the unsafe keyword. We'll discuss this in a separate issue.

Other fields

Other fields need no changes. But we no longer need to label them as #[repr(C)]. Concretely,

The Mutator::barrier field is the only one that needs some discussion. It is currently a Box<dyn Barrier>, and it can remain this way. Because the concrete barrier to use depends on the Plan which is selected at start-up time, some form of dynamic dispatch is always needed.

We previously discussed optimizing for write-barrier slow paths by embedding some data structures in TLS. Currently we only have an object-remembering barrier (and a field-remembering barrier in the lxr branch). To implement those kinds of barriers, we only need a bump-pointer, too. In this case, it is a cursor: *mut ObjectReference and a limit: *const ObjectReference, which is basically a bump-pointer into the ModBuf. Because this is strictly an optimization over the status quo, we may consider it as a separate issue.

Performance

The only thing that may affect performance is the Vec<BumpAllocator<VM>> and other vector of allocators in the Allocators struct. We consider the overhead of Vec reasonable because we will only access the complete allocators in slow paths. But we need to measure it anyway.

API changes

Because allocators are now held in Vec, the "offset from the start of Mutator to the concrete ImmixAllocator" and the "offset from the start of Mutator to a BumpPointer" no longer make sense. However, given a valid AllocatorSelector, we can still uniquely identify the address of the allocator. The Allocator::get_allocator method will still work, but AllocatorInfo won't work. We may refactor AllocatorInfo so that it returns references to the underlying BumpPointer instances rather than the offsets.

It is debatable whether we still allow a mutator thread's TLS to keep a pointer to the BumpPointer. Currently we give the VM binding full control over the Mutator as if it were a C struct. After this refactoring, the Box<Mutator> returned from memory_manager::bind_mutator() will be the only owning reference of the Mutator, and Rust doesn't like having another mutable reference into it (which violates the "unique reference" semantics of Box<T> and &mut T). Currently this only affects the Ruby binding. The ART binding is unaffected because it is moving the BumpPointer out of the allocator for fast-path allocation, and put it back when falling back to slow path.

Documentation

We should update our porting guide and remove the section about embedding the entire Mutator struct. The VM bindings should either not embed anything or embed only the BumpPointer struct.

The text was updated successfully, but these errors were encountered:

wks · 2025-05-11T12:52:19Z

If Mutator no longer needs to be repr(C), we will no longer need to hold allocators in Vec. We can return to JikesRVM's traditional method of making Mutator a plan-specific structure. For example,

struct ImmixMutator {
    #[allocator]
    immix_allocator: ImmixAllocator,
    #[parent]
    common: CommonMutator,
}

impl MutatorContext for ImmixMutator {
    fn alloc(...) {
        match semantics {
            AllocatorSemantics::Default => immix_allocator.alloc(...),
            AllocatorSemantics::Los => common.los_allocator.alloc(...),
            ...
        }
    }
    fn prepare(...) {...}
    fn release(...) {...}
    fn on_destroy(...) {...}
}

The concept of AllocatorSelector seems to just go away. But I still think a declarative AllocatorMapping is more readable.

qinsoon · 2025-05-12T00:34:58Z

If Mutator no longer needs to be repr(C), we will no longer need to hold allocators in Vec. We can return to JikesRVM's traditional method of making Mutator a plan-specific structure. For example,

struct ImmixMutator {
#[allocator]
immix_allocator: ImmixAllocator,
#[parent]
common: CommonMutator,
}

impl MutatorContext for ImmixMutator {
fn alloc(...) {
match semantics {
AllocatorSemantics::Default => immix_allocator.alloc(...),
AllocatorSemantics::Los => common.los_allocator.alloc(...),
...
}
}
fn prepare(...) {...}
fn release(...) {...}
fn on_destroy(...) {...}
}

The concept of AllocatorSelector seems to just go away. But I still think a declarative AllocatorMapping is more readable.

It was intended not to use the old approach. One of the designs was that we want the mutator struct to have a fixed size for all the plans. This is essential to allow dynamic plan selection. It also makes things easier for the bindings, as the fixed size won't change often, and the bindings that embed the struct don't need to worry about the size in most of the times.

qinsoon · 2025-05-12T00:40:55Z

The mutator struct is designed to be accessible by the native code in the bindings. A binding may embed the struct, use a pointer to the struct, or only store the fastpath data structures. As long as they need to access Mutator from native code in any of the above cases (which are all considered as valid uses of MMTk), we need to keep repr(C).

I don't think we can remove repr(C).

wks · 2025-05-12T01:45:57Z

It was intended not to use the old approach. One of the designs was that we want the mutator struct to have a fixed size for all the plans. This is essential to allow dynamic plan selection.

I don't see how a fixed-size Mutator struct is connected to dynamic plan selection. We can have BOTH opaque Mutator struct AND dynamic plan selection. All we need is creating the proper Mutator instance in bind_mutator.

But I remember that we did think so when we deviated from the JikesRVM approach. Maybe after years of engineering, we know that some of our assumptions were not true after all.

It also makes things easier for the bindings, as the fixed size won't change often, and the bindings that embed the struct don't need to worry about the size in most of the times.

This is based on the premise that the VM binding needs to embed the Mutator struct. But I think this is not true.

The mutator struct is designed to be accessible by the native code in the bindings. A binding may embed the struct, use a pointer to the struct, or only store the fastpath data structures. As long as they need to access Mutator from native code in any of the above cases (which are all considered as valid uses of MMTk), we need to keep repr(C).

I don't think we can remove repr(C).

This is my exact concern. I don't think the VM binding can use any part of Mutator other than the BumpPoineter without calling into mmtk-core, or at least implementing part of the VM-specific stuff in Rust. Take the ImmixAllocator as an example:

#[repr(C)]
pub struct ImmixAllocator<VM: VMBinding> {
    /// [`VMThread`] associated with this allocator instance
    pub tls: VMThread,
    /// The fastpath bump pointer.
    pub bump_pointer: BumpPointer,
    /// [`Space`](src/policy/space/Space) instance associated with this allocator instance.
    space: &'static ImmixSpace<VM>,
    context: Arc<AllocatorContext<VM>>,
    /// *unused*
    hot: bool,
    /// Is this a copy allocator?
    copy: bool,
    /// Bump pointer for large objects
    pub(in crate::util::alloc) large_bump_pointer: BumpPointer,
    /// Is the current request for large or small?
    request_for_large: bool,
    /// Hole-searching cursor
    line: Option<Line>,
}

The bump_pointer: BumpPointer field is obviously useful to C/C++. But other than that, tls is provided by the VM binding itself. ImmixSpace cannot be accessed from C/C++. hot is unused. copy is only used in allocation slow path (acquire_recyclable_block). large_bump_pointeris also aBumpPointerand potentially be used by C/C++, but it is currently only used in the slow path. request_for_largeis used by overflow allocation (currently in slow path, too). lineis anOptionand is inaccessible from C/C++ (in fact it is not evenrepr(C)`).

BumpAllocator and MarkCompactAllocator are simpler than ImmixAllocator because they only contain a BumpPointer. LargeObjectAllocator just holds a &'static LargeObjectSpace which is only accessible in Rust. The same is true for MallocAllocator. FreeListAllocator contains several BlockList pointers (Box). Currently we don't support free-list allocation fast paths, but there is an opportunity to simplify the block lists into something like BumpPointer so that they can be used by inlined fast paths.

And the Mutator struct also has a barrier field which is just a pointer. Other fields are opaque Rust structures.

So the status quo is that only BumpPointer can be meaningfully accessed from C/C++. Other parts are effectively opaque to C/C++ despite that some of them are labelled as repr(C).

qinsoon · 2025-05-12T01:49:49Z

A binding may embed the struct, use a pointer to the struct, or only store the fastpath data structures.

My assumption is that the above options are all valid uses of MMTk. We don't force users to use BumpPointer and fastpath data structures -- it is their choice.

Based on this, the mutator size needs to be constant, and the repr(C) is needed.

wks · 2025-05-12T02:50:16Z

A binding may embed the struct, use a pointer to the struct, or only store the fastpath data structures.

My assumption is that the above options are all valid uses of MMTk. We don't force users to use BumpPointer and fastpath data structures -- it is their choice.

Based on this, the mutator size needs to be constant, and the repr(C) is needed.

OK. I see where we disagree. I'd like to explicitly outlaw the practice of embedding the Mutator because there is no meaningful way to access most of its fields in C/C++, and it brings more troubles than benefits.

Even if the user worry about one level of indirection, we may allow them to embed, but we should still make Mutator opaque to C/C++ code. The C/C++ parts should not try to replicate the Mutator struct. Instead it may put a char mutator_place_holder[BIG_ENOUGH_TO_HOLD_MUTATOR] in the TLS. But the consequence is, all TLS fields after that field can longer be addressed via immediate addressing.

And I think maybe it is an orthogonal issue to turn [MaybeUninit<SomeAllocator>; MAX_SOME_ALLOCATOR] into Vec<SomeAllocator>. We can still allow VM bindings to embed Mutator while actual allocators are held in Vec in the malloc heap (which means the Mutator struct becomes even less useful to C/C++). Using Vec still allows us to remove the unsafe usage as described in #1319

wks · 2025-05-12T04:00:14Z

We discussed this today in our meeting. We should first refactoring the OpenJDK binding, eliminating the practice of replicating and embedding the entire Mutator, and embedding only the BumpPointer. This will show whether we can practically embed only the BumpPointer struct into TLS without sacrificing performance.

qinsoon · 2025-05-12T04:02:52Z

I'd like to explicitly outlaw the practice of embedding the Mutator because there is no meaningful way to access most of its fields in C/C++, and it brings more troubles than benefits.

If we want to disallow embedding mutator struct in the binding side, we need to see at least OpenJDK working with the new model proposed (using fastpath data structures). OpenJDK implements all the plans in MMTk, we need to make sure the idea works out well when all the plans are supported.

qinsoon · 2025-05-12T04:09:39Z

Theoratically, if we replace the current mutator (with allocators) with fastpath data structures, we would end up with a new mutator struct:

struct MutatorFastpath {
  allocators: AllocatorsFastpath,
  ...
}
struct AllocatorsFastpath {
    pub bump_pointer: [MaybeUninit<BumpPointer>; MAX_BUMP_ALLOCATORS],
    pub free_list: [MaybeUninit<FreeList<VM>>; MAX_FREE_LIST_ALLOCATORS],
    ...
}

It is just smaller than the old Mutator in size. All the other properties are the same:

It needs to be fixed sized.
It needs to be repr(C) (but this new type won't use many other types, as the old Mutator).

wks mentioned this issue May 11, 2025

Remove unsafe code related to get_allocator #1319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove repr(C) from Mutator #1318

Remove repr(C) from Mutator #1318

wks commented May 11, 2025

wks commented May 11, 2025

qinsoon commented May 12, 2025

qinsoon commented May 12, 2025

wks commented May 12, 2025

qinsoon commented May 12, 2025

wks commented May 12, 2025

wks commented May 12, 2025

qinsoon commented May 12, 2025

qinsoon commented May 12, 2025

Remove repr(C) from Mutator #1318

Remove repr(C) from Mutator #1318

Comments

wks commented May 11, 2025

The proposal: Remove repr(C) from Mutator

The Allocators struct

Other fields

Performance

API changes

Documentation

wks commented May 11, 2025

qinsoon commented May 12, 2025

qinsoon commented May 12, 2025

wks commented May 12, 2025

qinsoon commented May 12, 2025

wks commented May 12, 2025

wks commented May 12, 2025

qinsoon commented May 12, 2025

qinsoon commented May 12, 2025

The proposal: Remove `repr(C)` from `Mutator`

The `Allocators` struct