|
1 | 1 |
|
2 |
| -Code objects |
3 |
| -============ |
| 2 | +# Code objects |
4 | 3 |
|
5 |
| -Coming soon. |
| 4 | +A `CodeObject` is a builtin Python type that represents a compiled executable, |
| 5 | +such as a compiled function or class. |
| 6 | +It contains a sequence of bytecode instructions along with its associated |
| 7 | +metadata: data which is necessary to execute the bytecode instructions (such |
| 8 | +as the values of the constants they access) or context information such as |
| 9 | +the source code location, which is useful for debuggers and other tools. |
| 10 | + |
| 11 | +Since 3.11, the final field of the `PyCodeObject` C struct is an array |
| 12 | +of indeterminate length containing the bytecode, `code->co_code_adaptive`. |
| 13 | +(In older versions the code object was a |
| 14 | +[`bytes`](https://docs.python.org/dev/library/stdtypes.html#bytes) |
| 15 | +object, `code->co_code`; this was changed to save an allocation and to |
| 16 | +allow it to be mutated.) |
| 17 | + |
| 18 | +Code objects are typically produced by the bytecode [compiler](compiler.md), |
| 19 | +although they are often written to disk by one process and read back in by another. |
| 20 | +The disk version of a code object is serialized using the |
| 21 | +[marshal](https://docs.python.org/dev/library/marshal.html) protocol. |
| 22 | + |
| 23 | +Code objects are nominally immutable. |
| 24 | +Some fields (including `co_code_adaptive` and fields for runtime |
| 25 | +information such as `_co_monitoring`) are mutable, but mutable fields are |
| 26 | +not included when code objects are hashed or compared. |
| 27 | + |
| 28 | +## Source code locations |
| 29 | + |
| 30 | +Whenever an exception occurs, the interpreter adds a traceback entry to |
| 31 | +the exception for the current frame, as well as each frame on the stack that |
| 32 | +it unwinds. |
| 33 | +The `tb_lineno` field of a traceback entry is (lazily) set to the line |
| 34 | +number of the instruction that was executing in the frame at the time of |
| 35 | +the exception. |
| 36 | +This field is computed from the locations table, `co_linetable`, by the function |
| 37 | +[`PyCode_Addr2Line`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Line). |
| 38 | +Despite its name, `co_linetable` includes more than line numbers; it represents |
| 39 | +a 4-number source location for every instruction, indicating the precise line |
| 40 | +and column at which it begins and ends. This is a significant amount of data, |
| 41 | +so a compact format is very important. |
| 42 | + |
| 43 | +Note that traceback objects don't store all this information -- they store the start line |
| 44 | +number, for backward compatibility, and the "last instruction" value. |
| 45 | +The rest can be computed from the last instruction (`tb_lasti`) with the help of the |
| 46 | +locations table. For Python code, there is a convenience method |
| 47 | +(`codeobject.co_positions`)[https://docs.python.org/dev/reference/datamodel.html#codeobject.co_positions] |
| 48 | +which returns an iterator of `({line}, {endline}, {column}, {endcolumn})` tuples, |
| 49 | +one per instruction. |
| 50 | +There is also `co_lines()` which returns an iterator of `({start}, {end}, {line})` tuples, |
| 51 | +where `{start}` and `{end}` are bytecode offsets. |
| 52 | +The latter is described by [`PEP 626`](https://peps.python.org/pep-0626/); it is more |
| 53 | +compact, but doesn't return end line numbers or column offsets. |
| 54 | +From C code, you need to call |
| 55 | +[`PyCode_Addr2Location`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Location). |
| 56 | + |
| 57 | +As the locations table is only consulted when displaying a traceback and when |
| 58 | +tracing (to pass the line number to the tracing function), lookup is not |
| 59 | +performance critical. |
| 60 | +In order to reduce the overhead during tracing, the mapping from instruction offset to |
| 61 | +line number is cached in the ``_co_linearray`` field. |
| 62 | + |
| 63 | +### Format of the locations table |
| 64 | + |
| 65 | +The `co_linetable` bytes object of code objects contains a compact |
| 66 | +representation of the source code positions of instructions, which are |
| 67 | +returned by the `co_positions()` iterator. |
| 68 | + |
| 69 | +> [!NOTE] |
| 70 | +> `co_linetable` is not to be confused with `co_lnotab`. |
| 71 | +> For backwards compatibility, `co_lnotab` exposes the format |
| 72 | +> as it existed in Python 3.10 and lower: this older format |
| 73 | +> stores only the start line for each instruction. |
| 74 | +> It is lazily created from `co_linetable` when accessed. |
| 75 | +> See [`Objects/lnotab_notes.txt`](../Objects/lnotab_notes.txt) for more details. |
| 76 | +
|
| 77 | +`co_linetable` consists of a sequence of location entries. |
| 78 | +Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with the most significant bit unset. |
| 79 | + |
| 80 | +Each entry contains the following information: |
| 81 | +* The number of code units covered by this entry (length) |
| 82 | +* The start line |
| 83 | +* The end line |
| 84 | +* The start column |
| 85 | +* The end column |
| 86 | + |
| 87 | +The first byte has the following format: |
| 88 | + |
| 89 | +Bit 7 | Bits 3-6 | Bits 0-2 |
| 90 | + ---- | ---- | ---- |
| 91 | + 1 | Code | Length (in code units) - 1 |
| 92 | + |
| 93 | +The codes are enumerated in the `_PyCodeLocationInfoKind` enum. |
| 94 | + |
| 95 | +## Variable-length integer encodings |
| 96 | + |
| 97 | +Integers are often encoded using a variable-length integer encoding |
| 98 | + |
| 99 | +### Unsigned integers (`varint`) |
| 100 | + |
| 101 | +Unsigned integers are encoded in 6-bit chunks, least significant first. |
| 102 | +Each chunk but the last has bit 6 set. |
| 103 | +For example: |
| 104 | + |
| 105 | +* 63 is encoded as `0x3f` |
| 106 | +* 200 is encoded as `0x48`, `0x03` |
| 107 | + |
| 108 | +### Signed integers (`svarint`) |
| 109 | + |
| 110 | +Signed integers are encoded by converting them to unsigned integers, using the following function: |
| 111 | +```Python |
| 112 | +def convert(s): |
| 113 | + if s < 0: |
| 114 | + return ((-s)<<1) | 1 |
| 115 | + else: |
| 116 | + return (s<<1) |
| 117 | +``` |
| 118 | + |
| 119 | +*Location entries* |
| 120 | + |
| 121 | +The meaning of the codes and the following bytes are as follows: |
| 122 | + |
| 123 | +Code | Meaning | Start line | End line | Start column | End column |
| 124 | + ---- | ---- | ---- | ---- | ---- | ---- |
| 125 | + 0-9 | Short form | Δ 0 | Δ 0 | See below | See below |
| 126 | + 10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte |
| 127 | + 13 | No column info | Δ svarint | Δ 0 | None | None |
| 128 | + 14 | Long form | Δ svarint | Δ varint | varint | varint |
| 129 | + 15 | No location | None | None | None | None |
| 130 | + |
| 131 | +The Δ means the value is encoded as a delta from another value: |
| 132 | +* Start line: Delta from the previous start line, or `co_firstlineno` for the first entry. |
| 133 | +* End line: Delta from the start line |
| 134 | + |
| 135 | +*The short forms* |
| 136 | + |
| 137 | +Codes 0-9 are the short forms. The short form consists of two bytes, the second byte holding additional column information. The code is the start column divided by 8 (and rounded down). |
| 138 | +* Start column: `(code*8) + ((second_byte>>4)&7)` |
| 139 | +* End column: `start_column + (second_byte&15)` |
0 commit comments