Skip to content

Commit 42288cb

Browse files
committed
wait: add wake_up_pollfree()
Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
1 parent 0fcfb00 commit 42288cb

File tree

2 files changed

+33
-0
lines changed

2 files changed

+33
-0
lines changed

include/linux/wait.h

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void
217217
void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key);
218218
void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr);
219219
void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode);
220+
void __wake_up_pollfree(struct wait_queue_head *wq_head);
220221

221222
#define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)
222223
#define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL)
@@ -245,6 +246,31 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode);
245246
#define wake_up_interruptible_sync_poll_locked(x, m) \
246247
__wake_up_locked_sync_key((x), TASK_INTERRUPTIBLE, poll_to_key(m))
247248

249+
/**
250+
* wake_up_pollfree - signal that a polled waitqueue is going away
251+
* @wq_head: the wait queue head
252+
*
253+
* In the very rare cases where a ->poll() implementation uses a waitqueue whose
254+
* lifetime is tied to a task rather than to the 'struct file' being polled,
255+
* this function must be called before the waitqueue is freed so that
256+
* non-blocking polls (e.g. epoll) are notified that the queue is going away.
257+
*
258+
* The caller must also RCU-delay the freeing of the wait_queue_head, e.g. via
259+
* an explicit synchronize_rcu() or call_rcu(), or via SLAB_TYPESAFE_BY_RCU.
260+
*/
261+
static inline void wake_up_pollfree(struct wait_queue_head *wq_head)
262+
{
263+
/*
264+
* For performance reasons, we don't always take the queue lock here.
265+
* Therefore, we might race with someone removing the last entry from
266+
* the queue, and proceed while they still hold the queue lock.
267+
* However, rcu_read_lock() is required to be held in such cases, so we
268+
* can safely proceed with an RCU-delayed free.
269+
*/
270+
if (waitqueue_active(wq_head))
271+
__wake_up_pollfree(wq_head);
272+
}
273+
248274
#define ___wait_cond_timeout(condition) \
249275
({ \
250276
bool __cond = (condition); \

kernel/sched/wait.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,13 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode)
238238
}
239239
EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */
240240

241+
void __wake_up_pollfree(struct wait_queue_head *wq_head)
242+
{
243+
__wake_up(wq_head, TASK_NORMAL, 0, poll_to_key(EPOLLHUP | POLLFREE));
244+
/* POLLFREE must have cleared the queue. */
245+
WARN_ON_ONCE(waitqueue_active(wq_head));
246+
}
247+
241248
/*
242249
* Note: we use "set_current_state()" _after_ the wait-queue add,
243250
* because we need a memory barrier there on SMP, so that any

0 commit comments

Comments
 (0)