Skip to content

Commit 6fdda9a

Browse files
timekeeping: Avoid possible deadlock from clock_was_set_delayed
As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty raspberrypi#4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0 [ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580 [ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0 [ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80 [ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0 [ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120 [ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30 [ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160 [ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty raspberrypi#4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sasha Levin <[email protected]> Cc: stable <[email protected]> #3.10+ Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: John Stultz <[email protected]>
1 parent 5258d3f commit 6fdda9a

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

kernel/time/timekeeping.c

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1278,7 +1278,6 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
12781278

12791279
__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
12801280

1281-
clock_was_set_delayed();
12821281
clock_set = TK_CLOCK_WAS_SET;
12831282
}
12841283
}
@@ -1442,6 +1441,19 @@ static void update_wall_time(void)
14421441
write_seqcount_end(&timekeeper_seq);
14431442
out:
14441443
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
1444+
if (clock_was_set) {
1445+
/*
1446+
* XXX - I'd rather we just call clock_was_set(), but
1447+
* since we're currently holding the jiffies lock, calling
1448+
* clock_was_set would trigger an ipi which would then grab
1449+
* the jiffies lock and we'd deadlock. :(
1450+
* The right solution should probably be droping
1451+
* the jiffies lock before calling update_wall_time
1452+
* but that requires some rework of the tick sched
1453+
* code.
1454+
*/
1455+
clock_was_set_delayed();
1456+
}
14451457
}
14461458

14471459
/**
@@ -1702,11 +1714,13 @@ int do_adjtimex(struct timex *txc)
17021714
if (tai != orig_tai) {
17031715
__timekeeping_set_tai_offset(tk, tai);
17041716
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
1705-
clock_was_set_delayed();
17061717
}
17071718
write_seqcount_end(&timekeeper_seq);
17081719
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
17091720

1721+
if (tai != orig_tai)
1722+
clock_was_set();
1723+
17101724
ntp_notify_cmos_timer();
17111725

17121726
return ret;

0 commit comments

Comments
 (0)