Skip to content

Commit 506a66f

Browse files
committed
Revert "x86/apic: Ignore secondary threads if nosmt=force"
Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Tony Luck <[email protected]>
1 parent e14d7df commit 506a66f

File tree

4 files changed

+3
-29
lines changed

4 files changed

+3
-29
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2688,12 +2688,8 @@
26882688
Equivalent to smt=1.
26892689

26902690
[KNL,x86] Disable symmetric multithreading (SMT).
2691-
nosmt=force: Force disable SMT, similar to disabling
2692-
it in the BIOS except that some of the
2693-
resource partitioning effects which are
2694-
caused by having SMT enabled in the BIOS
2695-
cannot be undone. Depending on the CPU
2696-
type this might have a performance impact.
2691+
nosmt=force: Force disable SMT, cannot be undone
2692+
via the sysfs control file.
26972693

26982694
nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
26992695
(indirect branch prediction) vulnerability. System may

arch/x86/include/asm/apic.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -504,10 +504,8 @@ extern int default_check_phys_apicid_present(int phys_apicid);
504504

505505
#ifdef CONFIG_SMP
506506
bool apic_id_is_primary_thread(unsigned int id);
507-
bool apic_id_disabled(unsigned int id);
508507
#else
509508
static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
510-
static inline bool apic_id_disabled(unsigned int id) { return false; }
511509
#endif
512510

513511
extern void irq_enter(void);

arch/x86/kernel/acpi/boot.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -181,8 +181,7 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
181181
}
182182

183183
if (!enabled) {
184-
if (!apic_id_disabled(id))
185-
++disabled_cpus;
184+
++disabled_cpus;
186185
return -EINVAL;
187186
}
188187

arch/x86/kernel/apic/apic.c

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2204,16 +2204,6 @@ bool apic_id_is_primary_thread(unsigned int apicid)
22042204
return !(apicid & mask);
22052205
}
22062206

2207-
/**
2208-
* apic_id_disabled - Check whether APIC ID is disabled via SMT control
2209-
* @id: APIC ID to check
2210-
*/
2211-
bool apic_id_disabled(unsigned int id)
2212-
{
2213-
return (cpu_smt_control == CPU_SMT_FORCE_DISABLED &&
2214-
!apic_id_is_primary_thread(id));
2215-
}
2216-
22172207
/*
22182208
* Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
22192209
* and cpuid_to_apicid[] synchronized.
@@ -2309,15 +2299,6 @@ int generic_processor_info(int apicid, int version)
23092299
return -EINVAL;
23102300
}
23112301

2312-
/*
2313-
* If SMT is force disabled and the APIC ID belongs to
2314-
* a secondary thread, ignore it.
2315-
*/
2316-
if (apic_id_disabled(apicid)) {
2317-
pr_info_once("Ignoring secondary SMT threads\n");
2318-
return -EINVAL;
2319-
}
2320-
23212302
if (apicid == boot_cpu_physical_apicid) {
23222303
/*
23232304
* x86_bios_cpu_apicid is required to have processors listed

0 commit comments

Comments
 (0)