-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-132732: Automatically constant evaluate pure operations #132733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fidget-Spinner
commented
Apr 19, 2025
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: Constant evaluate/propagate pure ops automatically #132732
Misc/NEWS.d/next/Core_and_Builtins/2025-04-19-16-22-47.gh-issue-132732.jgqhlF.rst
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really neat!
Other than two opcodes I found that shouldn't be marked pure
, I just have one thought:
Rather than rewriting the bodies like this to use the symbols-manipulating functions (which seems error-prone), would we be able to just use stackrefs to do this?
For example, _BINARY_OP_ADD_INT
is defined like this:
PyObject *left_o = PyStackRef_AsPyObjectBorrow(left);
PyObject *right_o = PyStackRef_AsPyObjectBorrow(right);
// ...
res = PyStackRef_FromPyObjectSteal(res_o);
Rather than rewriting uses of these functions, could it be easier to just do something like this, since we're guranteed not to escape?
if (sym_is_const(ctx, stack_pointer[-2]) && sym_is_const(ctx, stack_pointer[-1])) {
// Generated code to turn constant symbols into stackrefs:
_PyStackRef left = PyStackRef_FromPyObjectBorrow(sym_get_const(ctx, stack_pointer[-2]));
_PyStackRef right = PyStackRef_FromPyObjectBorrow(sym_get_const(ctx, stack_pointer[-1]));
_PyStackRef res;
// Now the actual body, same as it appears in executor_cases.c.h:
PyObject *left_o = PyStackRef_AsPyObjectBorrow(left);
PyObject *right_o = PyStackRef_AsPyObjectBorrow(right);
// ...
res = PyStackRef_FromPyObjectSteal(res_o);
// Generated code to turn stackrefs into constant symbols:
stack_pointer[-1] = sym_new_const(ctx, PyStackRef_AsPyObjectSteal(res));
}
I'm not too familiar with the design of the cases generator though, so maybe this is way harder or something. Either way, I'm excited to see this get in!
Seems feasible. I could try to rewrite all occurences of the variable with a stackref-producing const one. Let me try that. |
I've verified no refleak on |
There's a lot going on in this PR, probably too much for one PR. Could we start with a PR to fix up the |
Could we have the default code generator generate a function for the body of the pure instruction and then call that from the three interpreters? |
Hm, I think I’d prefer not to. Sounds like it could hurt performance, especially for the JIT (where things can’t inline). |
I think a good progression would be:
|
I thought about this and I think we can inline if we autogenerate a header file and include that directly. But then we're at the mercy of the compiler in both the normal interpreter and the JIT deciding to inline or not to inline the body again. Which I truly do not want. |
@brandtbucher @markshannon what can I do to get this PR moving? @tomasr8 if youd like to review, here's a summary of the PR:
|
Thanks for the ping! I actually wanted to try/review this PR, I was just very busy this week with work :/ I'll have a look this weekend :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had time to skim the PR, I'll do a more thorough review this weekend :)
Co-Authored-By: Tomas R. <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this approach has a critical flaw. It is possible for the optimizer to see values that the executing code never would. For example, through a combination of statistical branch profiling and global to constant conversion.
class Disaster:
def __add__(self, other):
halt_and_catch_fire()
We don't want to be evaluating Disaster() +1
when optimizing BINARY_OP_ADD_INT
.
Maybe consider the approach used for TO_BOOL
where we call optimize_to_bool
for each family member, thus reducing the code duplication.
In addition, we could then optimize BINARY_OP
. After all, 1 + 1
is always 2, not just for BINARY_OP_ADD_INT
.
When you're done making the requested changes, leave the comment: And if you don't make the requested changes, you will be poked with soft cushions! |
In the first place, that's not possible. We only optimize what we specialize in the interpreter. The interpreter will never specialize that to BINARY_OP_ADD_INT. Furthermore, even if it did specialize
If you want to be more assured, how about I merge #132968 to add type assertions to our optimizer? That should make things safer. |
I forgot the guards don't actually guard at the optimization time. My bad. Yeah it seems we need some sort of check. |
@tomasr8 sorry this is going to make your life harder with the removing |
I have made the requested changes; please review again |
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
Note: I mark the object as stackref immortal (but not real immortal!) to simplify the reference management. This means we have no refcounting in the optimizer when constant evaluating stuff. Which makes things easier to reason about, as constants during the lifetime of the optimizer are effectively immortal anyways (the optimizer holds a single reference to all constants). |
Waiting for #134284 to be merged first, then I can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't really review the cases generator too closely (since I'm still not very familiar with it) but based on the code it generates, everything at least seems correct.
@@ -374,6 +374,7 @@ PyStackRef_FromPyObjectBorrow(PyObject *obj) | |||
} | |||
#define PyStackRef_FromPyObjectBorrow(obj) PyStackRef_FromPyObjectBorrow(_PyObject_CAST(obj)) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (typ == &PyLong_Type) || | ||
(typ == &PyUnicode_Type) || | ||
(typ == &PyFloat_Type) || | ||
(typ == &PyDict_Type) || | ||
(typ == &PyTuple_Type) || | ||
(typ == &PyList_Type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't constant-evaluate anything involving mutable containers. (Even tuple
scares me a tiny bit, since it can contain arbitrary objects, but I'm pretty sure it's okay.)
return (typ == &PyLong_Type) || | |
(typ == &PyUnicode_Type) || | |
(typ == &PyFloat_Type) || | |
(typ == &PyDict_Type) || | |
(typ == &PyTuple_Type) || | |
(typ == &PyList_Type); | |
return (typ == &_PyNone_Type) || | |
(typ == &PyBool_Type) || | |
(typ == &PyLong_Type) || | |
(typ == &PyFloat_Type) || | |
(typ == &PyUnicode_Type) || | |
(typ == &PyTuple_Type); |
@@ -75,7 +75,6 @@ def write_header( | |||
""" | |||
) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this back?
emitter.emit("/* Start of pure uop copied from bytecodes for constant evaluation */\n") | ||
emitter.emit_tokens(uop, storage, inst=None, emit_braces=False, is_abstract=True) | ||
out.start_line() | ||
emitter.emit("/* End of pure uop copied from bytecodes for constant evaluation */\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: maybe use //
instead of /*
/*/
for these comments, since they're not multi-line?
# All new stackrefs are created from new references. | ||
# That's how the stackref contract works. | ||
if not outp.peek: | ||
emitter.emit(f"{outp.name} = sym_new_const_steal(ctx, PyStackRef_AsPyObjectBorrow({outp.name}_stackref));\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may just be a week of conference sleep schedule, but my brain hurts trying to reason about the refcounting here. Why are we stealing a borrow? Shouldn't we be stealing a steal, or borrowing a borrow? Currently:
- If the tag bit is unset on the stackref, stealing a borrow will leave the refcount on the object unchanged and the tag bit unset. When the symbol is cleared after optimizing, the refcount on the object will be one less, which is correct.
- If the tag bit is set on the stackref, stealing a borrow will leave the refcount on the object itself unchanged and the tag bit still set. When the symbol is cleared after optimizing, the refcount on the object will be one less, which seems incorrect.
(I haven't looked at the peek code yet.)
Another option, that I might like better, is making all of our constant symbols use stackrefs under-the-hood. Then we could avoid refcounting entirely. But that's a bigger change that could happen later if needed.
This is also making me realize that we really should make it possible to detect refleaks/memory leaks on JIT builds soon. The problem is that new executors are allocated all over the place, leading to things like #120501. |