Skip to content

Fix has_cpuid implementation #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 25, 2018
72 changes: 41 additions & 31 deletions coresimd/x86/cpuid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,34 +78,53 @@ pub unsafe fn __cpuid(leaf: u32) -> CpuidResult {
}

/// Does the host support the `cpuid` instruction?
#[inline]
#[inline(never)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

pub fn has_cpuid() -> bool {
#[cfg(target_arch = "x86_64")]
{
true
}
#[cfg(target_arch = "x86")]
{
use coresimd::x86::{__readeflags, __writeeflags};

// On `x86` the `cpuid` instruction is not always available.
// This follows the approach indicated in:
// http://wiki.osdev.org/CPUID#Checking_CPUID_availability
unsafe {
// Read EFLAGS:
let eflags: u32 = __readeflags();

// Invert the ID bit in EFLAGS:
let eflags_mod: u32 = eflags | 0x0020_0000;

// Store the modified EFLAGS (ID bit may or may not be inverted)
__writeeflags(eflags_mod);

// Read EFLAGS again:
let eflags_after: u32 = __readeflags();

// Check if the ID bit changed:
eflags_after != eflags
// On `x86` the `cpuid` instruction is not always available.
// This follows the approach indicated in:
// http://wiki.osdev.org/CPUID#Checking_CPUID_availability
// https://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets/
// which detects whether `cpuid` is available by checking whether the 21st bit of the EFLAGS register is modifiable or not.
// If it is, then `cpuid` is available.
let result: u32;
let _temp: u32;
unsafe {
asm!(r#"
# Read eflags into $0 and copy into $1:
pushfd
pop $0
mov $1, $0
# Flip 21st bit:
xor $0, 0x200000
# Set eflags:
push $0
popfd
# Read eflags again, if cpuid is available
# the 21st bit will be flipped, otherwise it
# it will have the same value as the original in $1:
pushfd
pop $0
# Xor'ing with the original eflags should have the
# 21st bit set to true if cpuid is available and zero
# otherwise. All other bits have not been modified and
# are zero:
xor $0, $1
# Store in $0 the value of the 21st bit
shr $0, 21

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was in @stevecheckoway's version too, but I wonder why keep shifting if you're going to check for non-zero anyway?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the real reason that I left the shift is because I didn't want to read this table in sufficient detail to determine if there were any important interactions between the popfd and pushfd.

image

There's actually a good reason to do the shift (or use result & 0x20_0000 != 0). Of course, it's possible that an interrupt could occur between the popfd and the pushfd and the OS would be free to change flags like the AC flag or (even less likely) the IOPL bits. But a much better reason to do it is the trap flag can be set by attaching a debugger, breaking on the pushfd and single stepping it.

Here's an example program demonstrating this:

#![feature(asm)]

fn main() {
    let eflags1: u32;
    let eflags2: u32;
    unsafe {
        asm!(r#"
            pushfd
            mov     $0, dword ptr [esp]
            popfd
            pushfd
            pop     $1
            "#
            : "=r"(eflags1), "=r"(eflags2)
            :
            : "cc", "memory"
            : "intel");
    }
    println!("{:x}", eflags1);
    println!("{:x}", eflags2);
}

Running it outside the debugger gives

$ rustc -g --target i686-apple-darwin bad.rs
$ ./bad
286
286
$ ./bad
282
282

so you can see that the parity flag is changing between runs for some reason, but the two attempts reading the flags are consistent. Now let's do it with a debugger (edited to remove the unhelpful output)

$ otool -tv bad|grep -A1 popfl
0000279f	popfl
000027a0	pushfl
$ rust-lldb bad
[snip]
(lldb) b &bad::main
Breakpoint 1: where = bad`bad::main::hc90b428e8cbd006a at bad.rs:3, address = 0x00002780
(lldb) r
Process 3423 launched: '/Users/steve/bad' (i386)
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) br set -a 0x27a0
Breakpoint 2: where = bad`bad::main::hc90b428e8cbd006a + 32 at bad.rs:7, address = 0x000027a0
(lldb) c
Process 3423 resuming
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) si
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) c
Process 3423 resuming
286
386
Process 3423 exited with status = 0 (0x00000000)

So as you can see, the other flags really can change. Now I don't have access to an old-enough Intel processor that cpuid isn't supported to demonstrate the issue there. So either the shift or a mask is needed to select the appropriate bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. If the other bits could change, it would surely be safer to do the bitwise and?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm really tempted to be minimally smart here. It seems best to just stick to the exact sequence of instructions Intel suggests in the document linked in the comment, they gotta know best, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think there’s any danger of Intel adding new flags to eflags and all of bits 22 through 31 (63 in rflags) are reserved and have value zero.

That said, not doing a shift but instead masking result and testing for nonzero in Rust code and not in assembly should compile to a test r32, imm32 and setne r8 rather than shr r32, imm8 test r32, r32 and setne r8. I can’t imagine that’s ever going to be important though.

Using Intel’s code is fine too. It’s just doing some unnecessary work

Copy link

@stevecheckoway stevecheckoway Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the 32-bit toolchains supported by rustup, only one potentially doesn't have the cpuid instruction:

i386-apple-ios
i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-apple-darwin (installed)
i686-linux-android
i686-pc-windows-gnu
i686-pc-windows-msvc
i686-unknown-freebsd
i686-unknown-linux-gnu
i686-unknown-linux-musl

I think i386-apple-ios is the iOS simulator. I don't know if it actually supports cpuid. All of the rest of them are targeting at least a Pentium which supports cpuid. Unless there are plans to support early 486s, this could just return true.

Copy link
Contributor Author

@gnzlbg gnzlbg Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM supports more targets than the ones rustc supports and these can be targeted using xargo which is being integrated into cargo.

I think it is a good idea to add optimizations of this feature for targets that are known to have the cpuid function. The easiest one might be to return true on x86 if the target supports mmx. That covers all those targets except i386-apple-ios.

EDIT: I've filled #497 , PRs welcome ;)

"#
: "=r"(result), "=r"(_temp)
:
: "cc", "memory"
: "intel");
}
result != 0
}
}
}
Expand Down Expand Up @@ -138,17 +157,8 @@ mod tests {
assert!(cpuid::has_cpuid());
}

#[cfg(target_arch = "x86")]
#[test]
fn test_has_cpuid() {
unsafe {
let before = __readeflags();

if cpuid::has_cpuid() {
assert!(before != __readeflags());
} else {
assert!(before == __readeflags());
}
}
fn test_has_cpuid_idempotent() {
assert_eq!(cpuid::has_cpuid(), cpuid::has_cpuid());
}
}