Skip to content

Commit 371764e

Browse files
mpegregkh
authored andcommitted
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
commit 7c6986a upstream. In raise_backtrace_ipi() we iterate through the cpumask of CPUs, sending each an IPI asking them to do a backtrace, but we don't wait for the backtrace to happen. We then iterate through the CPU mask again, and if any CPU hasn't done the backtrace and cleared itself from the mask, we print a trace on its behalf, noting that the trace may be "stale". This works well enough when a CPU is not responding, because in that case it doesn't receive the IPI and the sending CPU is left to print the trace. But when all CPUs are responding we are left with a race between the sending and receiving CPUs, if the sending CPU wins the race then it will erroneously print a trace. This leads to spurious "stale" traces from the sending CPU, which can then be interleaved messily with the receiving CPU, note the CPU numbers, eg: [ 1658.929157][ C7] rcu: Stack dump where RCU GP kthread last ran: [ 1658.929223][ C7] Sending NMI from CPU 7 to CPUs 1: [ 1658.929303][ C1] NMI backtrace for cpu 1 [ 1658.929303][ C7] CPU 1 didn't respond to backtrace IPI, inspecting paca. [ 1658.929362][ C1] CPU: 1 PID: 325 Comm: kworker/1:1H Tainted: G W E 5.13.0-rc2+ #46 [ 1658.929405][ C7] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 325 (kworker/1:1H) [ 1658.929465][ C1] Workqueue: events_highpri test_work_fn [test_lockup] [ 1658.929549][ C7] Back trace of paca->saved_r1 (0xc0000000057fb400) (possibly stale): [ 1658.929592][ C1] NIP: c00000000002cf50 LR: c008000000820178 CTR: c00000000002cfa0 To fix it, change the logic so that the sending CPU waits 5s for the receiving CPU to print its trace. If the receiving CPU prints its trace successfully then the sending CPU just continues, avoiding any spurious "stale" trace. This has the added benefit of allowing all CPUs to print their traces in order and avoids any interleaving of their output. Fixes: 5cc0591 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") Cc: [email protected] # v4.18+ Reported-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 468e5a5 commit 371764e

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

arch/powerpc/kernel/stacktrace.c

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -172,17 +172,31 @@ static void handle_backtrace_ipi(struct pt_regs *regs)
172172

173173
static void raise_backtrace_ipi(cpumask_t *mask)
174174
{
175+
struct paca_struct *p;
175176
unsigned int cpu;
177+
u64 delay_us;
176178

177179
for_each_cpu(cpu, mask) {
178-
if (cpu == smp_processor_id())
180+
if (cpu == smp_processor_id()) {
179181
handle_backtrace_ipi(NULL);
180-
else
181-
smp_send_safe_nmi_ipi(cpu, handle_backtrace_ipi, 5 * USEC_PER_SEC);
182-
}
182+
continue;
183+
}
183184

184-
for_each_cpu(cpu, mask) {
185-
struct paca_struct *p = paca_ptrs[cpu];
185+
delay_us = 5 * USEC_PER_SEC;
186+
187+
if (smp_send_safe_nmi_ipi(cpu, handle_backtrace_ipi, delay_us)) {
188+
// Now wait up to 5s for the other CPU to do its backtrace
189+
while (cpumask_test_cpu(cpu, mask) && delay_us) {
190+
udelay(1);
191+
delay_us--;
192+
}
193+
194+
// Other CPU cleared itself from the mask
195+
if (delay_us)
196+
continue;
197+
}
198+
199+
p = paca_ptrs[cpu];
186200

187201
cpumask_clear_cpu(cpu, mask);
188202

0 commit comments

Comments
 (0)