-
Notifications
You must be signed in to change notification settings - Fork 133
NMatrix Developer Guide
The goal of this page is to give a general overview of NMatrix for those who may want to hack on the code, and to make it easier for new contributors to jump right in. This is obviously still a work in progress, but will hopefully fill out over time.
####Background
From within C/C++ code, ruby exposes objects as the typedef VALUE
, which as of this writing is an unsigned long
. Internally, this might be just the (slightly modified) value of a number (for things like fixnum), or a casted pointer to a ruby data structure.
The ruby garbage collector is a mark and sweep type collector. For ruby programs (ignoring C extensions for the moment), this means, in the simplest sense, that the interpreter keeps track of all objects in existence, and when garbage collection runs, a first pass marks all objects that are accessible from the code, and a second pass frees up all the objects that are no longer accessible. In a C extension, this might cause problems: the C code might use ruby VALUE
s that aren't accessible from the ruby side (and thus wouldn't be marked), but still shouldn't be freed up. There are a number of mechanisms in place to prevent this problem:
-
Garbage collection only runs once C code has returned to ruby, or during a call to a ruby C API method. This means that you don't need to worry about something like a dedicated GC thread starting garbage collection at any arbitrary point in your function; only defined points can be problematic.
-
Data_Wrap_Struct
: this is a ruby C API method that is used to wrap a C struct in a ruby VALUE (see ruby's README.ext for more details). It allows you to pass a marking function and a freeing function. If the ruby garbage collector marks this ruby VALUE, then the marking function will be called. By creating an appropriate marking function, it's possible to mark VALUEs hidden in the C struct and prevent them from being garbage collected. For NMatrix, this mechanism is key for the implementation of object-dtype NMatrix objects. -
Ruby checks the stack for
VALUE
s and pointers toVALUE
s still in use by your C code. This is pretty neat. If you for instance have a case where your code is:
VALUE x = ...;
rb_call_some_c_api_method();
return x;
Then ruby should see x on the stack and make sure not to garbage collect it during that api call. The same is true if x is a VALUE*
to some VALUE
(s) on the heap.
Two cases aren't sufficiently dealt with by these mechanisms.
-
You have a pointer on the stack to some struct that internally contains
VALUE
s, but you don't have a pointer to thoseVALUE
s (or theVALUE
s themselves) on the stack, and you want to make a ruby C API call. This would be simply solved by just putting theVALUE
s on the stack before the API call if not for the second problem. -
Optimizing compilers. If you're running the compiler with any optimizations turned on, it's hard to guarantee that any particular
VALUE
is actually on the stack when you need it to be. Given that NMatrix is a library for scientific computing, in which it's common to be CPU-limited, turning off optimizations is not ideal.
The typical solution to the problem of the optimizing compiler is to mark VALUE
s as volatile
, a keyword that (simplistically) indicates that some code that the compiler doesn't know about (whether hardware, another thread, etc.) might interact with the variable declared volatile. This generally means that the compiler won't optimize volatile variables out because there might be some unintended side effect.
To solve the problem using volatile
:
- Find everywhere there's a call to a ruby API method (or a call to an NMatrix method that calls a ruby API method, etc.).
- Before each call, ensure that all
VALUE
s in use by the code (whether normally declared directly or as part of a struct, etc.) are stored in a volatile variable on the stack.
However, it's not completely clear whether this will prevent all optimizations that would cause issues with the garbage collector. Even if volatile
does prevent all problematic optimizations, it's not clear that this is desirable from a performance perspective (however, more testing would be needed to figure this out). A reasonable interpretation of recent C++ specifications might also be that use of volatile
is discouraged except for hardware interactions. Thus, just marking all VALUEs volatile is perhaps not ideal.