Skip to content

Commit 4ff09eb

Browse files
iritkatrielebonnal
authored andcommitted
pythongh-119786: add code object doc, inline locations.md into it (python#126832)
1 parent 3c8fce6 commit 4ff09eb

File tree

6 files changed

+143
-82
lines changed

6 files changed

+143
-82
lines changed

InternalDocs/README.md

+1-3
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,7 @@ Compiling Python Source Code
2424
Runtime Objects
2525
---
2626

27-
- [Code Objects (coming soon)](code_objects.md)
28-
29-
- [The Source Code Locations Table](locations.md)
27+
- [Code Objects](code_objects.md)
3028

3129
- [Generators (coming soon)](generators.md)
3230

InternalDocs/code_objects.md

+137-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,139 @@
11

2-
Code objects
3-
============
2+
# Code objects
43

5-
Coming soon.
4+
A `CodeObject` is a builtin Python type that represents a compiled executable,
5+
such as a compiled function or class.
6+
It contains a sequence of bytecode instructions along with its associated
7+
metadata: data which is necessary to execute the bytecode instructions (such
8+
as the values of the constants they access) or context information such as
9+
the source code location, which is useful for debuggers and other tools.
10+
11+
Since 3.11, the final field of the `PyCodeObject` C struct is an array
12+
of indeterminate length containing the bytecode, `code->co_code_adaptive`.
13+
(In older versions the code object was a
14+
[`bytes`](https://docs.python.org/dev/library/stdtypes.html#bytes)
15+
object, `code->co_code`; this was changed to save an allocation and to
16+
allow it to be mutated.)
17+
18+
Code objects are typically produced by the bytecode [compiler](compiler.md),
19+
although they are often written to disk by one process and read back in by another.
20+
The disk version of a code object is serialized using the
21+
[marshal](https://docs.python.org/dev/library/marshal.html) protocol.
22+
23+
Code objects are nominally immutable.
24+
Some fields (including `co_code_adaptive` and fields for runtime
25+
information such as `_co_monitoring`) are mutable, but mutable fields are
26+
not included when code objects are hashed or compared.
27+
28+
## Source code locations
29+
30+
Whenever an exception occurs, the interpreter adds a traceback entry to
31+
the exception for the current frame, as well as each frame on the stack that
32+
it unwinds.
33+
The `tb_lineno` field of a traceback entry is (lazily) set to the line
34+
number of the instruction that was executing in the frame at the time of
35+
the exception.
36+
This field is computed from the locations table, `co_linetable`, by the function
37+
[`PyCode_Addr2Line`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Line).
38+
Despite its name, `co_linetable` includes more than line numbers; it represents
39+
a 4-number source location for every instruction, indicating the precise line
40+
and column at which it begins and ends. This is a significant amount of data,
41+
so a compact format is very important.
42+
43+
Note that traceback objects don't store all this information -- they store the start line
44+
number, for backward compatibility, and the "last instruction" value.
45+
The rest can be computed from the last instruction (`tb_lasti`) with the help of the
46+
locations table. For Python code, there is a convenience method
47+
(`codeobject.co_positions`)[https://docs.python.org/dev/reference/datamodel.html#codeobject.co_positions]
48+
which returns an iterator of `({line}, {endline}, {column}, {endcolumn})` tuples,
49+
one per instruction.
50+
There is also `co_lines()` which returns an iterator of `({start}, {end}, {line})` tuples,
51+
where `{start}` and `{end}` are bytecode offsets.
52+
The latter is described by [`PEP 626`](https://peps.python.org/pep-0626/); it is more
53+
compact, but doesn't return end line numbers or column offsets.
54+
From C code, you need to call
55+
[`PyCode_Addr2Location`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Location).
56+
57+
As the locations table is only consulted when displaying a traceback and when
58+
tracing (to pass the line number to the tracing function), lookup is not
59+
performance critical.
60+
In order to reduce the overhead during tracing, the mapping from instruction offset to
61+
line number is cached in the ``_co_linearray`` field.
62+
63+
### Format of the locations table
64+
65+
The `co_linetable` bytes object of code objects contains a compact
66+
representation of the source code positions of instructions, which are
67+
returned by the `co_positions()` iterator.
68+
69+
> [!NOTE]
70+
> `co_linetable` is not to be confused with `co_lnotab`.
71+
> For backwards compatibility, `co_lnotab` exposes the format
72+
> as it existed in Python 3.10 and lower: this older format
73+
> stores only the start line for each instruction.
74+
> It is lazily created from `co_linetable` when accessed.
75+
> See [`Objects/lnotab_notes.txt`](../Objects/lnotab_notes.txt) for more details.
76+
77+
`co_linetable` consists of a sequence of location entries.
78+
Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with the most significant bit unset.
79+
80+
Each entry contains the following information:
81+
* The number of code units covered by this entry (length)
82+
* The start line
83+
* The end line
84+
* The start column
85+
* The end column
86+
87+
The first byte has the following format:
88+
89+
Bit 7 | Bits 3-6 | Bits 0-2
90+
---- | ---- | ----
91+
1 | Code | Length (in code units) - 1
92+
93+
The codes are enumerated in the `_PyCodeLocationInfoKind` enum.
94+
95+
## Variable-length integer encodings
96+
97+
Integers are often encoded using a variable-length integer encoding
98+
99+
### Unsigned integers (`varint`)
100+
101+
Unsigned integers are encoded in 6-bit chunks, least significant first.
102+
Each chunk but the last has bit 6 set.
103+
For example:
104+
105+
* 63 is encoded as `0x3f`
106+
* 200 is encoded as `0x48`, `0x03`
107+
108+
### Signed integers (`svarint`)
109+
110+
Signed integers are encoded by converting them to unsigned integers, using the following function:
111+
```Python
112+
def convert(s):
113+
if s < 0:
114+
return ((-s)<<1) | 1
115+
else:
116+
return (s<<1)
117+
```
118+
119+
*Location entries*
120+
121+
The meaning of the codes and the following bytes are as follows:
122+
123+
Code | Meaning | Start line | End line | Start column | End column
124+
---- | ---- | ---- | ---- | ---- | ----
125+
0-9 | Short form | Δ 0 | Δ 0 | See below | See below
126+
10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte
127+
13 | No column info | Δ svarint | Δ 0 | None | None
128+
14 | Long form | Δ svarint | Δ varint | varint | varint
129+
15 | No location | None | None | None | None
130+
131+
The Δ means the value is encoded as a delta from another value:
132+
* Start line: Delta from the previous start line, or `co_firstlineno` for the first entry.
133+
* End line: Delta from the start line
134+
135+
*The short forms*
136+
137+
Codes 0-9 are the short forms. The short form consists of two bytes, the second byte holding additional column information. The code is the start column divided by 8 (and rounded down).
138+
* Start column: `(code*8) + ((second_byte>>4)&7)`
139+
* End column: `start_column + (second_byte&15)`

InternalDocs/compiler.md

+3-5
Original file line numberDiff line numberDiff line change
@@ -443,14 +443,12 @@ reference to the source code (filename, etc). All of this is implemented by
443443
Code objects
444444
============
445445

446-
The result of `PyAST_CompileObject()` is a `PyCodeObject` which is defined in
446+
The result of `_PyAST_Compile()` is a `PyCodeObject` which is defined in
447447
[Include/cpython/code.h](../Include/cpython/code.h).
448448
And with that you now have executable Python bytecode!
449449

450-
The code objects (byte code) are executed in [Python/ceval.c](../Python/ceval.c).
451-
This file will also need a new case statement for the new opcode in the big switch
452-
statement in `_PyEval_EvalFrameDefault()`.
453-
450+
The code objects (byte code) are executed in `_PyEval_EvalFrameDefault()`
451+
in [Python/ceval.c](../Python/ceval.c).
454452

455453
Important files
456454
===============

InternalDocs/interpreter.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ from the instruction definitions in [Python/bytecodes.c](../Python/bytecodes.c)
1616
which are written in [a DSL](../Tools/cases_generator/interpreter_definition.md)
1717
developed for this purpose.
1818

19-
Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_object.md),
19+
Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_objects.md),
2020
which contains the bytecode instructions along with static data that is required to execute them,
2121
such as the consts list, variable names,
2222
[exception table](exception_handling.md#format-of-the-exception-table), and so on.

InternalDocs/locations.md

-69
This file was deleted.

Objects/lnotab_notes.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Description of the internal format of the line number table in Python 3.10
22
and earlier.
33

4-
(For 3.11 onwards, see Objects/locations.md)
4+
(For 3.11 onwards, see InternalDocs/code_objects.md)
55

66
Conceptually, the line number table consists of a sequence of triples:
77
start-offset (inclusive), end-offset (exclusive), line-number.

0 commit comments

Comments
 (0)