-
Notifications
You must be signed in to change notification settings - Fork 228
Native support for AVR instrinsics #711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for filing this. I don't think we are blocked on LLVM having support for the calling convention, instead we should be able to use naked functions to translate the cc like we do for aeabi intrinsics Lines 21 to 35 in cb06005
I'm not positive how this would interact with the clobbers, however. |
Oh, sure, that seems like a good idea 🙂 |
Separately you may be interested in getting AVR asm to stable. This isn't a blocker for this crate but it may be useful in user code rust-lang/rust#93335 |
Also, do you know if there is a version of these symbols in assembly that don't come from libgcc? Some of these are probably easy enough that we could provide our own assembly implementations, but we need to be careful with licensing. |
I'm aware only of the libgcc implementation, unfortunately. |
After rust-lang/rust#131651 merges it should be possible to fix the AVRTiny asm convention, at which point I think we will be okay to start using AVR assembly in this repo. The |
@Patryk27 do you happen to know which intrinsics we are actually missing? Testing with the below: #![no_std]
#[unsafe(no_mangle)]
pub fn call_umulhisi3(a: &u16, b: &u16, r: &mut u32) {
*r = *a as u32 * *b as u32;
}
#[unsafe(no_mangle)]
pub fn call_mulhisi3(a: &i16, b: &i16, r: &mut i32) {
*r = *a as i32 * *b as i32;
}
#[unsafe(no_mangle)]
pub fn call_udivmodqi4(a: &u8, b: &u8, r1: &mut u8, r2: &mut u8) {
(*r1, *r2) = (*a / *b, *a % *b);
}
#[unsafe(no_mangle)]
pub fn call_divmodqi4(a: &i8, b: &i8, r1: &mut i8, r2: &mut i8) {
(*r1, *r2) = (*a / *b, *a % *b);
}
#[unsafe(no_mangle)]
pub fn call_udivmodhi4(a: &u16, b: &u16, r1: &mut u16, r2: &mut u16) {
(*r1, *r2) = (*a / *b, *a % *b);
}
#[unsafe(no_mangle)]
pub fn call_divmodhi4(a: &i16, b: &i16, r1: &mut i16, r2: &mut i16) {
(*r1, *r2) = (*a / *b, *a % *b);
} On the atmega328p target-cpu (unsure whether this makes a difference), the only intrinsics from the list at https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention are the 8- and 16-bit integer division |
Status: was a little bit busy previous week, I'll take a look at those intrinsics and your example in a couple of days 🙂 |
You're right - it seems that the only Funny Intrinsics™ used at the moment are I have tried to summon #[inline(never)]
pub fn fun(a: &u16, b: &u16, x: &mut u16, y: &mut u16) {
(*x, *y) = a.widening_mul(*b);
} ... but even this causes LLVM to go with 16b->32b extension followed by 32b x 32b multiplication (aka All of those observations match the intrinsics actually hardcoded into the codegen: ... which confirms that we'd only have to provide Curiously enough, some time ago I thought that ABI conflicts were the reason why f32 is broken on AVR: ... but since they don't have special ABI, it must be a codegen bug, actually (this particular investigation will follow on Rahix/avr-hal#641). |
Alright, there is another kind of ABI mismatch happening, though! Context:
tl;dr intrinsics like those: compiler-builtins/src/float/cmp.rs Line 114 in 7bec089
... should (most likely?) return Edit: actually, it's not even
Should be easy to implement with a regular cfg-guarded type alias. |
It seems there's an ABI difference in regards to 32-bit division and remainder as well. Instead of compiler-builtins' ... and that's what also came out when I was running avr-tester's random-math test-suite on compiler-builtins without any Now that I have a deeper understanding of how the pieces come together, I see a light out of the tunnel - I'm preparing a pull request that gets rid of all/most |
Ok, it seems that implementing #[cfg(target_arch = "avr")]
intrinsics! {
#[maybe_use_optimized_c_shim]
pub extern "C" fn __udivmodsi4(n: u32, d: u32) -> u64 {
let (div, rem) = u32_div_rem(n, d);
((rem as u64) << 32) | (div as u64)
}
} ... yields a working intrinsic! Unfortunately we can't go with the simpler |
I'm experimenting with
I was thinking it would be fun to try implementing the algorithm in Rust, but I don't know how to express the mapping between the input/output parameters and the registers required by the calling convention. Does that limit us to an asm-only solution? I assume we'd need to implement the calling convention somewhere to allow us to |
Integer labels should work here through right? So
It is possible to add compiler support for this ABI and implement it that way. That probably doesn't have to happen though - it's only needed a couple of places in this crate and likely nowhere else in the ecosystem, so lang/compiler support would be pretty heavy handed. One option is to "trampoline" ABIs by using compiler-builtins/compiler-builtins/src/arm.rs Lines 23 to 77 in 9978a8b
So, feel free to submit a PR adding what you have there :) we don't need to add everything at once. edit: you beat me to it |
I want to mention that I've known that the assembly for my division functions would blow up into an absolute monstrosity the more levels of functions there are and the smaller the registers are. I oriented around delegating to half size divisions to make a larger division, but LLVM would inline nest things which would tend to quadratic code complexity. The way I would do it for 8 and maybe 16 bit microcontrollers, given enough time, is to use a common set of byte-slice-based functions for addition, multiplication, etc. The same division function can be used for everything from
lhs += rhs << shift and lhs -= rhs << shift functions for implementing multiplication and division. You can brute-force test smaller cases to make sure they are correct, and rely on the fuzzing framework to check the rest.
|
The unsigned division intrinsics have merged! The signed equivalents are more complicated as the semantics differs between languages. The book "Hacker's Delight" (2nd edition, chapter 9) describes three different implementations (truncating, modulus, or floor) and lists some languages that have made differing choices. From what I can find the Rust reference says
Does the intrinsic need to implement Rust's division semantics or something else? Also confusing is that Rust's Div trait says |
Usually language-specific semantics are implemented in the standard library, the builtins are closer to emulating behavior on real hardware As an example, the division routines are allowed to create UB if the divisor is 0. The backends pick the actual semantics here, but LLVM and Cranelift almost always match GCC. It should act the same as
The linked example is |
After staring at Everything I've seen says to use an unsigned division and modify the sign of the results based on the sign of the inputs. Mathematically the gcc implementation is as straightforward as it gets.
At the cost of a few extra instructions I've packed the remainder sign into R0 instead of the T flag. There's no alternative algorithm for negation (implemented in the ALU). I'm unsure how to proceed. pub unsafe extern "C" fn __divmodqi4() {
// compute signed 8-bit `n / d` and `n % d`.
//
// Note: GCC implements a [non-standard calling convention](https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention) for this function.
// Inputs:
// R24: dividend
// R22: divisor
// Outputs:
// R24: quotient (dividend / divisor)
// R25: remainder (dividend % divisor)
// Clobbers:
// R23: loop counter
// R0: sign bits
// T: unused
core::arch::naked_asm!(
// This assembly routine adjusts the inputs to perform an unsigned division.
// The quotient is negative when the dividend and divisor signs differ.
// The remainder has the same sign as the dividend.
"mov R0, R22",
"eor R0, R24", // R0.7 is the quotient sign
"cbr R0, 0", // R0.0 is the remainder sign
"tst R24",
"brpos 1f", // if dividend is negative
"neg R24", // negate to a positive value
"sbr R0, 0", // set remainder sign
"1:",
"sbrc R22, 7", // if divisor is negative
"neg R22", // negate to a positive value
"call __udivmodqi4", // perform unsigned division
// R24 = quotient, R25 = remainder
"sbrc R0, 7", // apply quotient sign
"neg R24",
"sbrc R0, 0", // apply remainder sign
"neg R25",
"ret"
);
} |
I'm not a lawyer, but I don't think we have to be 100% different from GCC, it's mostly about "was the code copy-pasted verbatim" etc.; I'd guess that what you propose - assuming the algorithm is alright, haven't checked this one - is good. |
IIUC even beyond avoiding copy+paste you can't take inspiration from the source, and that's the part that matters rather than how different it is. Which is tough when you have read the source, I don't know what the safe way out of this is. @joshtriplett is much more the expert here than me |
It's might also be feasible to put all of this into a new file that is GPL and say that on AVR only you will be using GPL sources (already the case for any users that are linking GCC's runtime libs), which is nice because it would let you take the exact implementation. The downside is of course how the licensing changes propagate everywhere, I'm not sure what that would do to rust-lang/rust. |
The approach given is the only way I've ever known to do signed division in the many implementations I've made over the years. I don't think there is any licensing issue as long as you wrote the assembly from scratch. The only special cases are related to the |
At the moment compiler-builtins isn't useful for AVR, because that platform uses a custom calling convention for intrinsics - supporting AVR here would require:
There are no ongoing plans to do so yet, but over rust-lang/rust#131651 it was pointed out it would be nice to have some kind of message of this issue left, so here we go 🙂
The text was updated successfully, but these errors were encountered: