Skip to content

Commit f4debc8

Browse files
committed
Resolve DLL imports at CRT startup, not on demand
On Windows, libstd uses GetProcAddress to locate some DLL imports, so that libstd can run on older versions of Windows. If a given DLL import is not present, then libstd uses other behavior (such as fallback implementations). This commit uses a feature of the Windows CRT to do these DLL imports during module initialization, before main() (or DllMain()) is called. This is the ideal time to resolve imports, because the module is effectively single-threaded at that point; no other threads can touch the data or code of the module that is being initialized. This avoids several problems. First, it makes the cost of performing the DLL import lookups deterministic. Right now, the DLL imports are done on demand, which means that application threads _might_ have to do the DLL import during some time-sensitive operation. This is a small source of unpredictability. Since threads can race, it's even possible to have more than one thread running the same redundant DLL lookup. This commit also removes using the heap to allocate strings, during the DLL lookups.
1 parent b122908 commit f4debc8

File tree

3 files changed

+95
-71
lines changed

3 files changed

+95
-71
lines changed

library/std/src/sys/windows/c.rs

+1
Original file line numberDiff line numberDiff line change
@@ -975,6 +975,7 @@ extern "system" {
975975
pub fn freeaddrinfo(res: *mut ADDRINFOA);
976976

977977
pub fn GetProcAddress(handle: HMODULE, name: LPCSTR) -> *mut c_void;
978+
pub fn GetModuleHandleA(lpModuleName: LPCSTR) -> HMODULE;
978979
pub fn GetModuleHandleW(lpModuleName: LPCWSTR) -> HMODULE;
979980

980981
pub fn GetSystemTimeAsFileTime(lpSystemTimeAsFileTime: LPFILETIME);

library/std/src/sys/windows/compat.rs

+88-65
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,116 @@
1-
//! A "compatibility layer" for spanning XP and Windows 7
1+
//! A "compatibility layer" for supporting older versions of Windows
22
//!
3-
//! The standard library currently binds many functions that are not available
4-
//! on Windows XP, but we would also like to support building executables that
5-
//! run on XP. To do this we specify all non-XP APIs as having a fallback
6-
//! implementation to do something reasonable.
3+
//! The standard library uses some Windows API functions that are not present
4+
//! on older versions of Windows. (Note that the oldest version of Windows
5+
//! that Rust supports is Windows 7 (client) and Windows Server 2008 (server).)
6+
//! This module implements a form of delayed DLL import binding, using
7+
//! `GetModuleHandle` and `GetProcAddress` to look up DLL entry points at
8+
//! runtime.
79
//!
8-
//! This dynamic runtime detection of whether a function is available is
9-
//! implemented with `GetModuleHandle` and `GetProcAddress` paired with a
10-
//! static-per-function which caches the result of the first check. In this
11-
//! manner we pay a semi-large one-time cost up front for detecting whether a
12-
//! function is available but afterwards it's just a load and a jump.
13-
14-
use crate::ffi::CString;
15-
use crate::sys::c;
16-
17-
pub fn lookup(module: &str, symbol: &str) -> Option<usize> {
18-
let mut module: Vec<u16> = module.encode_utf16().collect();
19-
module.push(0);
20-
let symbol = CString::new(symbol).unwrap();
21-
unsafe {
22-
let handle = c::GetModuleHandleW(module.as_ptr());
23-
match c::GetProcAddress(handle, symbol.as_ptr()) as usize {
24-
0 => None,
25-
n => Some(n),
26-
}
27-
}
28-
}
10+
//! This implementation uses a static initializer to look up the DLL entry
11+
//! points. The CRT (C runtime) executes static initializers before `main`
12+
//! is called (for binaries) and before `DllMain` is called (for DLLs).
13+
//! This is the ideal time to look up DLL imports, because we are guaranteed
14+
//! that no other threads will attempt to call these entry points. Thus,
15+
//! we can look up the imports and store them in `static mut` fields
16+
//! without any synchronization.
17+
//!
18+
//! This has an additional advantage: Because the DLL import lookup happens
19+
//! at module initialization, the cost of these lookups is deterministic,
20+
//! and is removed from the code paths that actually call the DLL imports.
21+
//! That is, there is no unpredictable "cache miss" that occurs when calling
22+
//! a DLL import. For applications that benefit from predictable delays,
23+
//! this is a benefit. This also eliminates the comparison-and-branch
24+
//! from the hot path.
25+
//!
26+
//! Currently, the standard library uses only a small number of dynamic
27+
//! DLL imports. If this number grows substantially, then the cost of
28+
//! performing all of the lookups at initialization time might become
29+
//! substantial.
30+
//!
31+
//! The mechanism of registering a static initializer with the CRT is
32+
//! documented in
33+
//! [CRT Initialization](https://docs.microsoft.com/en-us/cpp/c-runtime-library/crt-initialization?view=msvc-160).
34+
//! It works by contributing a global symbol to the `.CRT$XCU` section.
35+
//! The linker builds a table of all static initializer functions.
36+
//! The CRT startup code then iterates that table, calling each
37+
//! initializer function.
38+
//!
39+
//! # **WARNING!!*
40+
//! The environment that a static initializer function runs in is highly
41+
//! constrained. There are **many** restrictions on what static initializers
42+
//! can safely do. Static initializer functions **MUST NOT** do any of the
43+
//! following (this list is not comprehensive):
44+
//! * touch any other static field that is used by a different static
45+
//! initializer, because the order that static initializers run in
46+
//! is not defined.
47+
//! * call `LoadLibrary` or any other function that acquires the DLL
48+
//! loader lock.
49+
//! * call any Rust function or CRT function that touches any static
50+
//! (global) state.
2951
3052
macro_rules! compat_fn {
3153
($module:literal: $(
3254
$(#[$meta:meta])*
33-
pub fn $symbol:ident($($argname:ident: $argtype:ty),*) -> $rettype:ty $body:block
55+
pub fn $symbol:ident($($argname:ident: $argtype:ty),*) -> $rettype:ty $fallback_body:block
3456
)*) => ($(
3557
$(#[$meta])*
3658
pub mod $symbol {
3759
#[allow(unused_imports)]
3860
use super::*;
39-
use crate::sync::atomic::{AtomicUsize, Ordering};
4061
use crate::mem;
4162

4263
type F = unsafe extern "system" fn($($argtype),*) -> $rettype;
4364

44-
static PTR: AtomicUsize = AtomicUsize::new(0);
45-
46-
#[allow(unused_variables)]
47-
unsafe extern "system" fn fallback($($argname: $argtype),*) -> $rettype $body
48-
49-
/// This address is stored in `PTR` to incidate an unavailable API.
50-
///
51-
/// This way, call() will end up calling fallback() if it is unavailable.
52-
///
53-
/// This is a `static` to avoid rustc duplicating `fn fallback()`
54-
/// into both load() and is_available(), which would break
55-
/// is_available()'s comparison. By using the same static variable
56-
/// in both places, they'll refer to the same (copy of the)
57-
/// function.
65+
/// Points to the DLL import, or the fallback function.
5866
///
59-
/// LLVM merging the address of fallback with other functions
60-
/// (because of unnamed_addr) is fine, since it's only compared to
61-
/// an address from GetProcAddress from an external dll.
62-
static FALLBACK: F = fallback;
67+
/// This static can be an ordinary, unsynchronized, mutable static because
68+
/// we guarantee that all of the writes finish during CRT initialization,
69+
/// and all of the reads occur after CRT initialization.
70+
static mut PTR: Option<F> = None;
6371

64-
#[cold]
65-
fn load() -> usize {
66-
// There is no locking here. It's okay if this is executed by multiple threads in
67-
// parallel. `lookup` will result in the same value, and it's okay if they overwrite
68-
// eachothers result as long as they do so atomically. We don't need any guarantees
69-
// about memory ordering, as this involves just a single atomic variable which is
70-
// not used to protect or order anything else.
71-
let addr = crate::sys::compat::lookup($module, stringify!($symbol))
72-
.unwrap_or(FALLBACK as usize);
73-
PTR.store(addr, Ordering::Relaxed);
74-
addr
75-
}
72+
/// This symbol is what allows the CRT to find the `init` function and call it.
73+
/// It is marked `#[used]` because otherwise Rust would assume that it was not
74+
/// used, and would remove it.
75+
#[used]
76+
#[link_section = ".CRT$XCU"]
77+
static INIT_TABLE_ENTRY: fn() = init;
7678

77-
fn addr() -> usize {
78-
match PTR.load(Ordering::Relaxed) {
79-
0 => load(),
80-
addr => addr,
79+
fn init() {
80+
// There is no locking here. This code is executed before main() is entered, and
81+
// is guaranteed to be single-threaded.
82+
//
83+
// DO NOT do anything interesting or complicated in this function! DO NOT call
84+
// any Rust functions or CRT functions, if those functions touch any global state,
85+
// because this function runs during global initialization. For example, DO NOT
86+
// do any dynamic allocation, don't call LoadLibrary, etc.
87+
unsafe {
88+
let module_name: *const u8 = concat!($module, "\0").as_ptr();
89+
let symbol_name: *const u8 = concat!(stringify!($symbol), "\0").as_ptr();
90+
let module_handle = $crate::sys::c::GetModuleHandleA(module_name as *const i8);
91+
if !module_handle.is_null() {
92+
match $crate::sys::c::GetProcAddress(module_handle, symbol_name as *const i8) as usize {
93+
0 => {}
94+
n => {
95+
PTR = Some(mem::transmute::<usize, F>(n));
96+
}
97+
}
98+
}
8199
}
82100
}
83101

84102
#[allow(dead_code)]
85-
pub fn is_available() -> bool {
86-
addr() != FALLBACK as usize
103+
pub fn option() -> Option<F> {
104+
unsafe { PTR }
87105
}
88106

107+
#[allow(dead_code)]
89108
pub unsafe fn call($($argname: $argtype),*) -> $rettype {
90-
mem::transmute::<usize, F>(addr())($($argname),*)
109+
if let Some(ptr) = PTR {
110+
ptr($($argname),*)
111+
} else {
112+
$fallback_body
113+
}
91114
}
92115
}
93116

library/std/src/sys/windows/thread_parker.rs

+6-6
Original file line numberDiff line numberDiff line change
@@ -108,10 +108,10 @@ impl Parker {
108108
return;
109109
}
110110

111-
if c::WaitOnAddress::is_available() {
111+
if let Some(wait_on_address) = c::WaitOnAddress::option() {
112112
loop {
113113
// Wait for something to happen, assuming it's still set to PARKED.
114-
c::WaitOnAddress(self.ptr(), &PARKED as *const _ as c::LPVOID, 1, c::INFINITE);
114+
wait_on_address(self.ptr(), &PARKED as *const _ as c::LPVOID, 1, c::INFINITE);
115115
// Change NOTIFIED=>EMPTY but leave PARKED alone.
116116
if self.state.compare_exchange(NOTIFIED, EMPTY, Acquire, Acquire).is_ok() {
117117
// Actually woken up by unpark().
@@ -140,9 +140,9 @@ impl Parker {
140140
return;
141141
}
142142

143-
if c::WaitOnAddress::is_available() {
143+
if let Some(wait_on_address) = c::WaitOnAddress::option() {
144144
// Wait for something to happen, assuming it's still set to PARKED.
145-
c::WaitOnAddress(self.ptr(), &PARKED as *const _ as c::LPVOID, 1, dur2timeout(timeout));
145+
wait_on_address(self.ptr(), &PARKED as *const _ as c::LPVOID, 1, dur2timeout(timeout));
146146
// Set the state back to EMPTY (from either PARKED or NOTIFIED).
147147
// Note that we don't just write EMPTY, but use swap() to also
148148
// include an acquire-ordered read to synchronize with unpark()'s
@@ -192,9 +192,9 @@ impl Parker {
192192
// purpose, to make sure every unpark() has a release-acquire ordering
193193
// with park().
194194
if self.state.swap(NOTIFIED, Release) == PARKED {
195-
if c::WakeByAddressSingle::is_available() {
195+
if let Some(wake_by_address_single) = c::WakeByAddressSingle::option() {
196196
unsafe {
197-
c::WakeByAddressSingle(self.ptr());
197+
wake_by_address_single(self.ptr());
198198
}
199199
} else {
200200
// If we run NtReleaseKeyedEvent before the waiting thread runs

0 commit comments

Comments
 (0)