-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-113939: Frame clear, clear locals #113940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f_locals might still contain references to the local vars. Fix python#113939.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
This comment was marked as resolved.
This comment was marked as resolved.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Misc/NEWS.d/next/Core and Builtins/2024-01-12-16-40-07.gh-issue-113939.Yi3L-e.rst
Outdated
Show resolved
Hide resolved
Lib/test/test_frame.py
Outdated
except ZeroDivisionError as exc: | ||
support.gc_collect() | ||
self.assertIsNotNone(wr()) | ||
print(exc.__traceback__.tb_frame.f_locals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just want to access exc.__traceback__.tb_frame.f_locals
here (trigger a getattr
), which is important for the test. I just thought that also printing the result might be interesting for debugging purpose, but we could also leave away the print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to DELETE_FAST
seems odd to me.
Python/bytecodes.c
Outdated
@@ -1508,6 +1508,7 @@ dummy_func( | |||
PyObject *v = GETLOCAL(oparg); | |||
ERROR_IF(v == NULL, unbound_local_error); | |||
SETLOCAL(oparg, NULL); | |||
Py_CLEAR(LOCALS()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this? DELETE_FAST
should delete a single "fast" local variable, which should have nothing to do with f_locals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rational is explained in the issue #113939. To summarize: I think you would expect that the exception object (and its linked traceback, the frames, and locals of the frames) would go out of scope when you leave the except
block, and then deleted, when there are no other references to it. The DELETE_FAST
then would delete the object (ref count becomes 0). However, if f_locals
was accessed at some point, a DELETE_FAST
would not trigger the delete, as another copy of the locals is in f_locals
. It would stay alive as long as you access f_locals
again which would resync the locals, or once the function leaves.
My use case was a training loop, where the function is running for a very long time, and the locals of some frames where an exception might have occured contained huge amount of memory, and it was very unexpected to me, aber making sure every reference to the exception was deleted, that the memory still was not deleted. Even more unexpectedly was, after getting another exception, then the memory was freed. Only then to find out that this f_locals
dict was still holding a reference, and once I accessed f_locals
again (in the next exception later), it was freed.
So, this change here is one possible simple fix, to get the expected behavior. But I understand that this is maybe not how you want it. The question is, how do you want it? Do you have a better suggestion? Or is this all expected behavior, and no change should be done here? (To me, it was unexpected behavior.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_locals_cleared_after_exception_handled
tests exactly this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a particular problem related to the lifetime of a handled exception, I think we should think of this as a problem with exceptions, and try to fix it within the exception handling mechanism rather than change DELETE_FAST
.
Can we make POP_EXCEPT clear the locals, and move it to after the
LOAD_CONST 0 (None)
STORE_FAST 0 (e)
DELETE_FAST 0 (e)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make POP_EXCEPT clear the locals
Yes, I also thought about that. However, then I thought, maybe there are other situations where the lifetime of objects is unexpected? Basically, whenever there is a DELETE_FAST
on some local variable, and f_locals
was accessed before, it would result in this maybe unexpected behavior that the object stays alive until f_locals
is accessed again, or until the function finishes.
So, you say, this is more important now for leaving except
blocks, and less important for other cases of DELETE_FAST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of f_locals
is somewhat broken, see PEP 667. Fiddling with DELETE_FAST
is not going to fix that unfortunately, but is likely to break other stuff.
Which of the test cases fail with just the change to frame_clear
to delete f_locals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which of the test cases fail with just the change to
frame_clear
to deletef_locals
?
test_locals_cleared_after_exception_handled
still fails with only the frame_clear
change but not the DELETE_FAST
change.
The semantics of
f_locals
is somewhat broken, see PEP-667.
Ah, yes. I was thinking the same when looking through the code, that this f_locals
vs fast locals (and those sync functions) potentially has a lot of issues.
Fiddling with
DELETE_FAST
is not going to fix that unfortunately,
Well, it fixes the issue here, that there are still references to the object while there should not.
but is likely to break other stuff.
I'm not sure. Why? If DELETE_FAST
executes but f_locals
is not touched by the op, then in any case f_locals
is in a wrong state (not in sync with the fast locals anymore). So this change seems to be more safe to me than not doing it.
Or do you want to allow that bytecode can lead to such inconsistent state, and only Python source code is required to give consistent behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertz, I appreciate that you've come this far in tracking down your issue. But I don't think the solution is to change bytecodes. Surely if we change DELETE_FAST
, which implements del x
, we should also change STORE_FAST
, which implements x = None
and also loses a reference to whatever was originally in x
. But we don't want to make STORE_FAST
slower, not even by one memory read + a conditional jump.
I also don't want bytecodes to mess with f_locals
because the latter logically belongs to a totally different part of the interpreter.
I really think that the way to fix this, eventually, is to change the was locals()
and frame.f_locals
are implemented. There are two PEPs about this, 667 and 558. My preference would be 667, but both are currently blocked, mostly on people's time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the response. Yes, I agree, PEP-667 would be much cleaner, and also solve this problem.
Instead of messing with bytecodes, do you have maybe another idea for a simple workaround until PEP-667 is implemented?
Or otherwise, for this PR here, should we remove the DELETE_FAST
change, but still do the frame.clear()
change? I think the change in frame.clear()
should be fine, right?
In that case, remove also the test_locals_cleared_after_exception_handled
, which would still fail then, or disable this test somehow?
Yes, I think that's the way to go. |
What about the test case? Should I also delete it? For reference, this one: class LocalsTest(unittest.TestCase):
"""
Tests for locals.
"""
def test_locals_cleared_after_exception_handled(self):
# see gh-113939
class C:
pass
wr = None
def inner():
nonlocal wr
c = C()
wr = weakref.ref(c)
1/0
try:
inner()
except ZeroDivisionError as exc:
support.gc_collect()
self.assertIsNotNone(wr())
print(exc.__traceback__.tb_frame.f_locals)
support.gc_collect()
self.assertIsNone(wr()) The other test case ( |
I did that now. So, PR can be merged now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add whitespace, otherwise LGTM.
frame.f_locals
might still contain references to the local vars.Fix #113939.
Note, for PyTorch and others, when you first do extended exception reporting which accesses
f_locals
in any way, this here fixes two arising problems. Related:traceback.clear_frames
does not clear locals when there have been previous access tof_locals
#113939