Skip to content

gh-114944: Fix race between _PyParkingLot_Park and _PyParkingLot_UnparkAll when handling interrupts #114945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 5, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Feb 3, 2024

There is a potential race when _PyParkingLot_UnparkAll is executing in one thread and another thread is unblocked because of an interrupt in _PyParkingLot_Park. Consider the following scenario:

  1. Thread T0 is blocked1 in _PyParkingLot_Park on address A.
  2. Thread T1 executes _PyParkingLot_UnparkAll on address A. It finds the wait_entry for T0 and unlinks2 its list node.
  3. Immediately after (2), T0 is woken up due to an interrupt. It then segfaults trying to unlink3 the node that was previously unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket lock. _PyParkingLot_Park will wait to handle the coming wakeup, and not attempt to unlink the node, when this field is set. _PyParkingLot_Unpark does this already, presumably to handle this case.

Footnotes

  1. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L303

  2. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L369

  3. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L320

…hen handling interrupts

There is a potential race when `_PyParkingLot_UnparkAll` is executing in
one thread and another thread is unblocked because of an interrupt in
`_PyParkingLot_Park`. Consider the following scenario:

1. Thread T0 is blocked[^1] in `_PyParkingLot_Park` on address `A`.
2. Thread T1 executes `_PyParkingLot_UnparkAll` on address `A`. It
   finds the `wait_entry` for `T0` and unlinks[^2] its list node.
3. Immediately after (2), T0 is woken up due to an interrupt. It
   then segfaults trying to unlink[^3] the node that was previously
   unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket
lock. `_PyParkingLot_Park` will wait to handle the coming wakeup, and not
attempt to unlink the node, when this field is set. `_PyParkingLot_Unpark`
does this already, presumably to handle this case.

[^1]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L303
[^2]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L369
[^3]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L320
struct llist_node head = LLIST_INIT(head);
Bucket *bucket = &buckets[((uintptr_t)addr) % NUM_BUCKETS];

_PyRawMutex_Lock(&bucket->mutex);
dequeue_all(bucket, addr, &head);
llist_for_each(node, &head) {
struct wait_entry *waiter = llist_data(node, struct wait_entry, node);
waiter->is_unparking = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, but let's push the waiter->is_unparking assignment into dequeue_all and dequeue to avoid the extra loop here.

@mpage mpage marked this pull request as ready for review February 3, 2024 03:14
@mpage
Copy link
Contributor Author

mpage commented Feb 3, 2024

@colesbury - Can you take another look?

@mpage
Copy link
Contributor Author

mpage commented Feb 3, 2024

@DinoV or @carljm - Any chance either of you can take a look / merge this?

@colesbury colesbury requested a review from DinoV February 5, 2024 19:52
@DinoV DinoV merged commit c32bae5 into python:main Feb 5, 2024
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this pull request Feb 14, 2024
…gLot_UnparkAll` when handling interrupts (python#114945)

Fix race between `_PyParkingLot_Park` and `_PyParkingLot_UnparkAll` when handling interrupts

There is a potential race when `_PyParkingLot_UnparkAll` is executing in
one thread and another thread is unblocked because of an interrupt in
`_PyParkingLot_Park`. Consider the following scenario:

1. Thread T0 is blocked[^1] in `_PyParkingLot_Park` on address `A`.
2. Thread T1 executes `_PyParkingLot_UnparkAll` on address `A`. It
   finds the `wait_entry` for `T0` and unlinks[^2] its list node.
3. Immediately after (2), T0 is woken up due to an interrupt. It
   then segfaults trying to unlink[^3] the node that was previously
   unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket
lock. `_PyParkingLot_Park` will wait to handle the coming wakeup, and not
attempt to unlink the node, when this field is set. `_PyParkingLot_Unpark`
does this already, presumably to handle this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants