-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Data race between compare_generic and insert_combined_dict under free-threading #132641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd like to work on this issue and submit a PR to fix it. Is anyone already working on this, or may I take it? Thank you. |
Which code did you run to trigger these data races? |
It is from JAX TSAN CI job running cpu tests suite. I can try to get a small reproducer. |
We are still seeing this frequently in our CI. I'm trying to come up with a reproducer for this and so far haven't succeeded. However, reading the code, I'm curious about the following: In cpython/Modules/_functoolsmodule.c Line 1482 in 6d5a8c2
We then use that critical section to call _PyDict_GetItemRef_KnownHash_LockHeld with no other locking cpython/Modules/_functoolsmodule.c Line 1309 in 6d5a8c2
However on the write side of the cache, we start by acquiring a critical section on cpython/Modules/_functoolsmodule.c Line 1500 in 6d5a8c2
_PyDict_SetItem_KnownHash (e.g., cpython/Modules/_functoolsmodule.c Line 1386 in 6d5a8c2
Line 2693 in 6d5a8c2
In other words, we are confused about which critical section is protecting the dictionary state. Is it the critical section on the lru cache itself, or it the critical section on the dict it contains? |
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
…ection use. The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding. Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself. Fixes python#132641
Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict.
…GH-133787) Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict. (cherry picked from commit 9ad0c7b) Co-authored-by: Peter Hawkins <[email protected]>
…3787) (#133979) gh-132641: fix race in `lru_cache` under free-threading (GH-133787) Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict. (cherry picked from commit 9ad0c7b) Co-authored-by: Peter Hawkins <[email protected]>
Bug report
Bug description:
I built main branch and observed the following races under free-threading in cpython 3.14 (Python 3.14.0a7+ experimental free-threading build (heads/main:e42bda94411, Apr 17 2025, 14:08:39) [Clang 18.1.3 (1ubuntu1)])
Race 1
Race 2
Full report: https://gist.github.com/vfdev-5/cbb9189043737d023b755191b62951cf
cc @hawkinsp
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
lru_cache
under free-threading #133787lru_cache
under free-threading (GH-133787) #133979The text was updated successfully, but these errors were encountered: