Data race between compare_generic and insert_combined_dict under free-threading #132641

vfdev-5 · 2025-04-17T15:35:17Z

Bug report

Bug description:

I built main branch and observed the following races under free-threading in cpython 3.14 (Python 3.14.0a7+ experimental free-threading build (heads/main:e42bda94411, Apr 17 2025, 14:08:39) [Clang 18.1.3 (1ubuntu1)])

Race 1

WARNING: ThreadSanitizer: data race (pid=33872)
  Read of size 8 at 0x7fffc22d7ad8 by thread T5 (mutexes: read M0):
    #0 compare_generic /project/cpython/Objects/dictobject.c:1107:13 (python3.14+0x26edbc) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #1 do_lookup /project/cpython/Objects/dictobject.c:1013:23 (python3.14+0x26edbc)
    #2 dictkeys_generic_lookup /project/cpython/Objects/dictobject.c:1132:12 (python3.14+0x26edbc)
    #3 _Py_dict_lookup /project/cpython/Objects/dictobject.c:1298:14 (python3.14+0x26edbc)
    #4 _PyDict_GetItemRef_KnownHash_LockHeld /project/cpython/Objects/dictobject.c:2330:21 (python3.14+0x272a14) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
   ...

  Previous atomic write of size 8 at 0x7fffc22d7ad8 by thread T6 (mutexes: read M0):
    #0 _Py_atomic_store_ptr_release /project/cpython/./Include/cpython/pyatomic_gcc.h:565:3 (python3.14+0x284d66) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #1 insert_combined_dict /project/cpython/Objects/dictobject.c:1748:9 (python3.14+0x284d66)
    #2 insertdict /project/cpython/Objects/dictobject.c:1854:13 (python3.14+0x274254) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #3 _PyDict_SetItem_KnownHash_LockHeld /project/cpython/Objects/dictobject.c:2656:12 (python3.14+0x273be9) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #4 _PyDict_SetItem_KnownHash /project/cpython/Objects/dictobject.c:2673:11 (python3.14+0x274635) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
  ...

Race 2

WARNING: ThreadSanitizer: data race (pid=33872)
  Read of size 8 at 0x7fffc22d7ad0 by thread T5 (mutexes: read M0):
    #0 compare_generic /project/cpython/Objects/dictobject.c:1110:13 (python3.14+0x26edd5) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #1 do_lookup /project/cpython/Objects/dictobject.c:1013:23 (python3.14+0x26edd5)
    #2 dictkeys_generic_lookup /project/cpython/Objects/dictobject.c:1132:12 (python3.14+0x26edd5)
    #3 _Py_dict_lookup /project/cpython/Objects/dictobject.c:1298:14 (python3.14+0x26edd5)
    #4 _PyDict_GetItemRef_KnownHash_LockHeld /project/cpython/Objects/dictobject.c:2330:21 (python3.14+0x272a14) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
   ...

  Previous atomic write of size 8 at 0x7fffc22d7ad0 by thread T6 (mutexes: read M0):
    #0 insert_combined_dict /project/cpython/Objects/dictobject.c (python3.14+0x284d8c) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #1 insertdict /project/cpython/Objects/dictobject.c:1854:13 (python3.14+0x274254) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #2 _PyDict_SetItem_KnownHash_LockHeld /project/cpython/Objects/dictobject.c:2656:12 (python3.14+0x273be9) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    #3 _PyDict_SetItem_KnownHash /project/cpython/Objects/dictobject.c:2673:11 (python3.14+0x274635) (BuildId: d491981d76fd0b67bcb5e01a978d07da019800b8)
    ...

Full report: https://gist.github.com/vfdev-5/cbb9189043737d023b755191b62951cf

cc @hawkinsp

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

The text was updated successfully, but these errors were encountered:

thomashammett · 2025-04-18T01:45:07Z

I'd like to work on this issue and submit a PR to fix it. Is anyone already working on this, or may I take it? Thank you.

vstinner · 2025-04-23T10:10:33Z

I built main branch and observed the following races under free-threading in cpython 3.14

Which code did you run to trigger these data races?

vfdev-5 · 2025-04-23T10:15:01Z

It is from JAX TSAN CI job running cpu tests suite. I can try to get a small reproducer.

hawkinsp · 2025-05-09T00:27:58Z

We are still seeing this frequently in our CI.

I'm trying to come up with a reproducer for this and so far haven't succeeded. However, reading the code, I'm curious about the following:

In bounded_lru_cache_get_lock_held we hold a critical section of self, i.e., the lru cache object itself:

cpython/Modules/_functoolsmodule.c

Line 1482 in 6d5a8c2

Py_BEGIN_CRITICAL_SECTION(self);

We then use that critical section to call _PyDict_GetItemRef_KnownHash_LockHeld with no other locking

cpython/Modules/_functoolsmodule.c

Line 1309 in 6d5a8c2

    
           int res = _PyDict_GetItemRef_KnownHash_LockHeld((PyDictObject *)self->cache, key_, hash_,

. In other words, the read part of the cache seems to assume that the critical section on the lru_cache object itself protects the dictionary state.

However on the write side of the cache, we start by acquiring a critical section on self as well (

cpython/Modules/_functoolsmodule.c

Line 1500 in 6d5a8c2

result = bounded_lru_cache_update_lock_held(self, result, key, hash);

). However in some code paths we call _PyDict_SetItem_KnownHash (e.g.,

cpython/Modules/_functoolsmodule.c

Line 1386 in 6d5a8c2

if (_PyDict_SetItem_KnownHash(self->cache, key, (PyObject *)link,

), which acquires a critical section on the dictionary object (

cpython/Objects/dictobject.c

Line 2693 in 6d5a8c2

Py_BEGIN_CRITICAL_SECTION(op);

), which I think we should think of as implicitly releasing the outer lock.

In other words, we are confused about which critical section is protecting the dictionary state. Is it the critical section on the lru cache itself, or it the critical section on the dict it contains?

…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641

Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict.

…GH-133787) Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict. (cherry picked from commit 9ad0c7b) Co-authored-by: Peter Hawkins <[email protected]>

…3787) (#133979) gh-132641: fix race in `lru_cache` under free-threading (GH-133787) Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict. (cherry picked from commit 9ad0c7b) Co-authored-by: Peter Hawkins <[email protected]>

vfdev-5 added the type-bug An unexpected behavior, bug, or error label Apr 17, 2025

picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels Apr 17, 2025

hawkinsp mentioned this issue May 9, 2025

gh-132641: fix race in lru_cache under free-threading #133787

Merged

kumaraditya303 closed this as completed in #133787 May 13, 2025

bedevere-app bot mentioned this issue May 13, 2025

[3.14] gh-132641: fix race in lru_cache under free-threading (GH-133787) #133979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race between compare_generic and insert_combined_dict under free-threading #132641

Data race between compare_generic and insert_combined_dict under free-threading #132641

vfdev-5 commented Apr 17, 2025 •

edited by bedevere-app bot

Loading

thomashammett commented Apr 18, 2025

vstinner commented Apr 23, 2025

vfdev-5 commented Apr 23, 2025 •

edited

Loading

hawkinsp commented May 9, 2025

Data race between compare_generic and insert_combined_dict under free-threading #132641

Data race between compare_generic and insert_combined_dict under free-threading #132641

Comments

vfdev-5 commented Apr 17, 2025 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

thomashammett commented Apr 18, 2025

vstinner commented Apr 23, 2025

vfdev-5 commented Apr 23, 2025 • edited Loading

hawkinsp commented May 9, 2025

vfdev-5 commented Apr 17, 2025 •

edited by bedevere-app bot

Loading

vfdev-5 commented Apr 23, 2025 •

edited

Loading