|
| 1 | +PEP: 523 |
| 2 | +Title: Adding a frame evaluation API to CPython |
| 3 | +Version: $Revision$ |
| 4 | +Last-Modified: $Date$ |
| 5 | +Author: Brett Cannon < [email protected]>, |
| 6 | + |
| 7 | +Status: Draft |
| 8 | +Type: Standards Track |
| 9 | +Content-Type: text/x-rst |
| 10 | +Created: 16-May-2016 |
| 11 | +Post-History: 16-May-2016 |
| 12 | + |
| 13 | + |
| 14 | +Abstract |
| 15 | +======== |
| 16 | + |
| 17 | +This PEP proposes to expand CPython's C API [#c-api]_ to allow for |
| 18 | +the specification of a per-interpreter function pointer to handle the |
| 19 | +evaluation of frames [#pyeval_evalframeex]_. This proposal also |
| 20 | +suggests adding a new field to code objects [#pycodeobject]_ to store |
| 21 | +arbitrary data for use by the frame evaluation function. |
| 22 | + |
| 23 | + |
| 24 | +Rationale |
| 25 | +========= |
| 26 | + |
| 27 | +One place where flexibility has been lacking in Python is in the direct |
| 28 | +execution of Python code. While CPython's C API [#c-api]_ allows for |
| 29 | +constructing the data going into a frame object and then evaluating it |
| 30 | +via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control over the |
| 31 | +execution of Python code comes down to individual objects instead of a |
| 32 | +holistic control of execution at the frame level. |
| 33 | + |
| 34 | +While wanting to have influence over frame evaluation may seem a bit |
| 35 | +too low-level, it does open the possibility for things such as a |
| 36 | +method-level JIT to be introduced into CPython without CPython itself |
| 37 | +having to provide one. By allowing external C code to control frame |
| 38 | +evaluation, a JIT can participate in the execution of Python code at |
| 39 | +the key point where evaluation occurs. This then allows for a JIT to |
| 40 | +conditionally recompile Python bytecode to machine code as desired |
| 41 | +while still allowing for executing regular CPython bytecode when |
| 42 | +running the JIT is not desired. This can be accomplished by allowing |
| 43 | +interpreters to specify what function to call to evaluate a frame. And |
| 44 | +by placing the API at the frame evaluation level it allows for a |
| 45 | +complete view of the execution environment of the code for the JIT. |
| 46 | + |
| 47 | +This ability to specify a frame evaluation function also allows for |
| 48 | +other use-cases beyond just opening CPython up to a JIT. For instance, |
| 49 | +it would not be difficult to implement a tracing or profiling function |
| 50 | +at the call level with this API. While CPython does provide the |
| 51 | +ability to set a tracing or profiling function at the Python level, |
| 52 | +this would be able to match the data collection of the profiler and |
| 53 | +quite possibly be faster for tracing by simply skipping per-line |
| 54 | +tracing support. |
| 55 | + |
| 56 | +It also opens up the possibility of debugging where the frame |
| 57 | +evaluation function only performs special debugging work when it |
| 58 | +detects it is about to execute a specific code object. In that |
| 59 | +instance the bytecode could be theoretically rewritten in-place to |
| 60 | +inject a breakpoint function call at the proper point for help in |
| 61 | +debugging while not having to do a heavy-handed approach as |
| 62 | +required by ``sys.settrace()``. |
| 63 | + |
| 64 | +To help facilitate these use-cases, we are also proposing the adding |
| 65 | +of a "scratch space" on code objects via a new field. This will allow |
| 66 | +per-code object data to be stored with the code object itself for easy |
| 67 | +retrieval by the frame evaluation function as necessary. The field |
| 68 | +itself will simply be a ``PyObject *`` type so that any data stored in |
| 69 | +the field will participate in normal object memory management. |
| 70 | + |
| 71 | + |
| 72 | +Proposal |
| 73 | +======== |
| 74 | + |
| 75 | +All proposed C API changes below will not be part of the stable ABI. |
| 76 | + |
| 77 | + |
| 78 | +Expanding ``PyCodeObject`` |
| 79 | +-------------------------- |
| 80 | + |
| 81 | +One field is to be added to the ``PyCodeObject`` struct |
| 82 | +[#pycodeobject]_:: |
| 83 | + |
| 84 | + typedef struct { |
| 85 | + ... |
| 86 | + PyObject *co_extra; /* "Scratch space" for the code object. */ |
| 87 | + } PyCodeObject; |
| 88 | + |
| 89 | +The ``co_extra`` will be ``NULL`` by default and will not be used by |
| 90 | +CPython itself. Third-party code is free to use the field as desired. |
| 91 | +Values stored in the field are expected to not be required in order |
| 92 | +for the code object to function, allowing the loss of the data of the |
| 93 | +field to be acceptable (this keeps the code object as immutable from |
| 94 | +a functionality point-of-view; this is slightly contentious and so is |
| 95 | +listed as an open issue in `Is co_extra needed?`_). The field will be |
| 96 | +freed like all other fields on ``PyCodeObject`` during deallocation |
| 97 | +using ``Py_XDECREF()``. |
| 98 | + |
| 99 | +It is not recommended that multiple users attempt to use the |
| 100 | +``co_extra`` simultaneously. While a dictionary could theoretically be |
| 101 | +set to the field and various users could use a key specific to the |
| 102 | +project, there is still the issue of key collisions as well as |
| 103 | +performance degradation from using a dictionary lookup on every frame |
| 104 | +evaluation. Users are expected to do a type check to make sure that |
| 105 | +the field has not been previously set by someone else. |
| 106 | + |
| 107 | + |
| 108 | +Expanding ``PyInterpreterState`` |
| 109 | +-------------------------------- |
| 110 | + |
| 111 | +The entrypoint for the frame evalution function is per-interpreter:: |
| 112 | + |
| 113 | + // Same type signature as PyEval_EvalFrameEx(). |
| 114 | + typedef PyObject* (__stdcall *PyFrameEvalFunction)(PyFrameObject*, int); |
| 115 | + |
| 116 | + typedef struct { |
| 117 | + ... |
| 118 | + PyFrameEvalFunction eval_frame; |
| 119 | + } PyInterpreterState; |
| 120 | + |
| 121 | +By default, the ``eval_frame`` field will be initialized to a function |
| 122 | +pointer that represents what ``PyEval_EvalFrameEx()`` currently is |
| 123 | +(called ``PyEval_EvalFrameDefault()``, discussed later in this PEP). |
| 124 | +Third-party code may then set their own frame evaluation function |
| 125 | +instead to control the execution of Python code. A pointer comparison |
| 126 | +can be used to detect if the field is set to |
| 127 | +``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. |
| 128 | + |
| 129 | + |
| 130 | +Changes to ``Python/ceval.c`` |
| 131 | +----------------------------- |
| 132 | + |
| 133 | +``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it currently stands |
| 134 | +will be renamed to ``PyEval_EvalFrameDefault()``. The new |
| 135 | +``PyEval_EvalFrameEx()`` will then become:: |
| 136 | + |
| 137 | + PyObject * |
| 138 | + PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) |
| 139 | + { |
| 140 | + PyThreadState *tstate = PyThreadState_GET(); |
| 141 | + return tstate->interp->eval_frame(frame, throwflag); |
| 142 | + } |
| 143 | + |
| 144 | +This allows third-party code to place themselves directly in the path |
| 145 | +of Python code execution while being backwards-compatible with code |
| 146 | +already using the pre-existing C API. |
| 147 | + |
| 148 | + |
| 149 | +Updating ``python-gdb.py`` |
| 150 | +-------------------------- |
| 151 | + |
| 152 | +The generated ``python-gdb.py`` file used for Python support in GDB |
| 153 | +makes some hard-coded assumptions about ``PyEval_EvalFrameEx()``, e.g. |
| 154 | +the names of local variables. It will need to be updated to work with |
| 155 | +the proposed changes. |
| 156 | + |
| 157 | + |
| 158 | +Performance impact |
| 159 | +================== |
| 160 | + |
| 161 | +As this PEP is proposing an API to add pluggability, performance |
| 162 | +impact is considered only in the case where no third-party code has |
| 163 | +made any changes. |
| 164 | + |
| 165 | +Several runs of pybench [#pybench]_ consistently showed no performance |
| 166 | +cost from the API change alone. |
| 167 | + |
| 168 | +A run of the Python benchmark suite [#py-benchmarks]_ showed no |
| 169 | +measurable cost in performance. |
| 170 | + |
| 171 | +In terms of memory impact, since there are typically not many CPython |
| 172 | +interpreters executing in a single process that means the impact of |
| 173 | +``co_extra`` being added to ``PyCodeObject`` is the only worry. |
| 174 | +According to [#code-object-count]_, a run of the Python test suite |
| 175 | +results in about 72,395 code objects being created. On a 64-bit |
| 176 | +CPU that would result in 579,160 bytes of extra memory being used if |
| 177 | +all code objects were alive at once and had nothing set in their |
| 178 | +``co_extra`` fields. |
| 179 | + |
| 180 | + |
| 181 | +Example Usage |
| 182 | +============= |
| 183 | + |
| 184 | +A JIT for CPython |
| 185 | +----------------- |
| 186 | + |
| 187 | +Pyjion |
| 188 | +'''''' |
| 189 | + |
| 190 | +The Pyjion project [#pyjion]_ has used this proposed API to implement |
| 191 | +a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code |
| 192 | +object has its ``co_extra`` field set to a ``PyjionJittedCode`` object |
| 193 | +which stores four pieces of information: |
| 194 | + |
| 195 | +1. Execution count |
| 196 | +2. A boolean representing whether a previous attempt to JIT failed |
| 197 | +3. A function pointer to a trampoline (which can be type tracing or not) |
| 198 | +4. A void pointer to any JIT-compiled machine code |
| 199 | + |
| 200 | +The frame evaluation function has (roughly) the following algorithm:: |
| 201 | + |
| 202 | + def eval_frame(frame, throw_flag): |
| 203 | + pyjion_code = frame.code.co_extra |
| 204 | + if not pyjion_code: |
| 205 | + frame.code.co_extra = PyjionJittedCode() |
| 206 | + elif not pyjion_code.jit_failed: |
| 207 | + if not pyjion_code.jit_code: |
| 208 | + return pyjion_code.eval(pyjion_code.jit_code, frame) |
| 209 | + elif pyjion_code.exec_count > 20_000: |
| 210 | + if jit_compile(frame): |
| 211 | + return pyjion_code.eval(pyjion_code.jit_code, frame) |
| 212 | + else: |
| 213 | + pyjion_code.jit_failed = True |
| 214 | + pyjion_code.exec_count += 1 |
| 215 | + return PyEval_EvalFrameDefault(frame, throw_flag) |
| 216 | + |
| 217 | +The key point, though, is that all of this work and logic is separate |
| 218 | +from CPython and yet with the proposed API changes it is able to |
| 219 | +provide a JIT that is compliant with Python semantics (as of this |
| 220 | +writing, performance is almost equivalent to CPython without the new |
| 221 | +API). This means there's nothing technically preventing others from |
| 222 | +implementing their own JITs for CPython by utilizing the proposed API. |
| 223 | + |
| 224 | + |
| 225 | +Other JITs |
| 226 | +'''''''''' |
| 227 | + |
| 228 | +It should be mentioned that the Pyston team was consulted on an |
| 229 | +earlier version of this PEP that was more JIT-specific and they were |
| 230 | +not interested in utilizing the changes proposed because they want |
| 231 | +control over memory layout they had no interest in directly supporting |
| 232 | +CPython itself. An informal discusion with a developer on the PyPy |
| 233 | +team led to a similar comment. |
| 234 | + |
| 235 | +Numba [#numba]_, on the other hand, suggested that they would be |
| 236 | +interested in the proposed change in a post-1.0 future for |
| 237 | +themselves [#numba-interest]_. |
| 238 | + |
| 239 | +The experimental Coconut JIT [#coconut]_ could have benefitted from |
| 240 | +this PEP. In private conversations with Coconut's creator we were told |
| 241 | +that our API was probably superior to the one they developed for |
| 242 | +Coconut to add JIT support to CPython. |
| 243 | + |
| 244 | + |
| 245 | +Debugging |
| 246 | +--------- |
| 247 | + |
| 248 | +In conversations with the Python Tools for Visual Studio team (PTVS) |
| 249 | +[#ptvs]_, they thought they would find these API changes useful for |
| 250 | +implementing more performant debugging. As mentioned in the Rationale_ |
| 251 | +section, this API would allow for switching on debugging functionality |
| 252 | +only in frames where it is needed. This could allow for either |
| 253 | +skipping information that ``sys.settrace()`` normally provides and |
| 254 | +even go as far as to dynamically rewrite bytecode prior to execution |
| 255 | +to inject e.g. breakpoints in the bytecode. |
| 256 | + |
| 257 | +It also turns out that Google has provided a very similar API |
| 258 | +internally for years. It has been used for performant debugging |
| 259 | +purposes. |
| 260 | + |
| 261 | + |
| 262 | +Implementation |
| 263 | +============== |
| 264 | + |
| 265 | +A set of patches implementing the proposed API is available through |
| 266 | +the Pyjion project [#pyjion]_. In its current form it has more |
| 267 | +changes to CPython than just this proposed API, but that is for ease |
| 268 | +of development instead of strict requirements to accomplish its goals. |
| 269 | + |
| 270 | + |
| 271 | +Open Issues |
| 272 | +=========== |
| 273 | + |
| 274 | +Allow ``eval_frame`` to be ``NULL`` |
| 275 | +----------------------------------- |
| 276 | + |
| 277 | +Currently the frame evaluation function is expected to always be set. |
| 278 | +It could very easily simply default to ``NULL`` instead which would |
| 279 | +signal to use ``PyEval_EvalFrameDefault()``. The current proposal of |
| 280 | +not special-casing the field seemed the most straight-forward, but it |
| 281 | +does require that the field not accidentally be cleared, else a crash |
| 282 | +may occur. |
| 283 | + |
| 284 | + |
| 285 | +Is co_extra needed? |
| 286 | +------------------- |
| 287 | + |
| 288 | +While discussing this PEP at PyCon US 2016, some core developers |
| 289 | +expressed their worry of the ``co_extra`` field making code objects |
| 290 | +mutable. The thinking seemed to be that having a field that was |
| 291 | +mutated after the creation of the code object made the object seem |
| 292 | +mutable, even though no other aspect of code objects changed. |
| 293 | + |
| 294 | +The view of this PEP is that the `co_extra` field doesn't change the |
| 295 | +fact that code objects are immutable. The field is specified in this |
| 296 | +PEP as to not contain information required to make the code object |
| 297 | +usable, making it more of a caching field. It could be viewed as |
| 298 | +similar to the UTF-8 cache that string objects have internally; |
| 299 | +strings are still considered immutable even though they have a field |
| 300 | +that is conditionally set. |
| 301 | + |
| 302 | +The field is also not strictly necessary. While the field greatly |
| 303 | +simplifies attaching extra information to code objects, other options |
| 304 | +such as keeping a mapping of code object memory addresses to what |
| 305 | +would have been kept in ``co_extra`` or perhaps using a weak reference |
| 306 | +of the data on the code object and then iterating through the weak |
| 307 | +references until the attached data is found is possible. But obviously |
| 308 | +all of these solutions are not as simple or performant as adding the |
| 309 | +``co_extra`` field. |
| 310 | + |
| 311 | + |
| 312 | +Rejected Ideas |
| 313 | +============== |
| 314 | + |
| 315 | +A JIT-specific C API |
| 316 | +-------------------- |
| 317 | + |
| 318 | +Originally this PEP was going to propose a much larger API change |
| 319 | +which was more JIT-specific. After soliciting feedback from the Numba |
| 320 | +team [#numba]_, though, it became clear that the API was unnecessarily |
| 321 | +large. The realization was made that all that was truly needed was the |
| 322 | +opportunity to provide a trampoline function to handle execution of |
| 323 | +Python code that had been JIT-compiled and a way to attach that |
| 324 | +compiled machine code along with other critical data to the |
| 325 | +corresponding Python code object. Once it was shown that there was no |
| 326 | +loss in functionality or in performance while minimizing the API |
| 327 | +changes required, the proposal was changed to its current form. |
| 328 | + |
| 329 | + |
| 330 | +References |
| 331 | +========== |
| 332 | + |
| 333 | +.. [#pyjion] Pyjion project |
| 334 | + (https://github.com/microsoft/pyjion) |
| 335 | + |
| 336 | +.. [#c-api] CPython's C API |
| 337 | + (https://docs.python.org/3/c-api/index.html) |
| 338 | + |
| 339 | +.. [#pycodeobject] ``PyCodeObject`` |
| 340 | + (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) |
| 341 | + |
| 342 | +.. [#coreclr] .NET Core Runtime (CoreCLR) |
| 343 | + (https://github.com/dotnet/coreclr) |
| 344 | + |
| 345 | +.. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` |
| 346 | + (https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx) |
| 347 | + |
| 348 | +.. [#pycodeobject] ``PyCodeObject`` |
| 349 | + (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) |
| 350 | + |
| 351 | +.. [#numba] Numba |
| 352 | + (http://numba.pydata.org/) |
| 353 | + |
| 354 | +.. [#numba-interest] numba-users mailing list: |
| 355 | + "Would the C API for a JIT entrypoint being proposed by Pyjion help out Numba?" |
| 356 | + (https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g) |
| 357 | + |
| 358 | +.. [#code-object-count] [Python-Dev] Opcode cache in ceval loop |
| 359 | + (https://mail.python.org/pipermail/python-dev/2016-February/143025.html) |
| 360 | + |
| 361 | +.. [#py-benchmarks] Python benchmark suite |
| 362 | + (https://hg.python.org/benchmarks) |
| 363 | + |
| 364 | +.. [#pyston] Pyston |
| 365 | + (http://pyston.org) |
| 366 | + |
| 367 | +.. [#pypy] PyPy |
| 368 | + (http://pypy.org/) |
| 369 | + |
| 370 | +.. [#ptvs] Python Tools for Visual Studio |
| 371 | + (http://microsoft.github.io/PTVS/) |
| 372 | + |
| 373 | +.. [#coconut] Coconut |
| 374 | + (https://github.com/davidmalcolm/coconut) |
| 375 | + |
| 376 | + |
| 377 | +Copyright |
| 378 | +========= |
| 379 | + |
| 380 | +This document has been placed in the public domain. |
| 381 | + |
| 382 | + |
| 383 | + |
| 384 | +.. |
| 385 | + Local Variables: |
| 386 | + mode: indented-text |
| 387 | + indent-tabs-mode: nil |
| 388 | + sentence-end-double-space: t |
| 389 | + fill-column: 70 |
| 390 | + coding: utf-8 |
| 391 | + End: |
0 commit comments