1
-
2
1
# Code objects
3
2
4
3
A ` CodeObject ` is a builtin Python type that represents a compiled executable,
@@ -43,7 +42,7 @@ so a compact format is very important.
43
42
Note that traceback objects don't store all this information -- they store the start line
44
43
number, for backward compatibility, and the "last instruction" value.
45
44
The rest can be computed from the last instruction (` tb_lasti ` ) with the help of the
46
- locations table. For Python code, there is a convenience method
45
+ locations table. For Python code, there is a convenience method
47
46
(` codeobject.co_positions ` )[ https://docs.python.org/dev/reference/datamodel.html#codeobject.co_positions ]
48
47
which returns an iterator of ` ({line}, {endline}, {column}, {endcolumn}) ` tuples,
49
48
one per instruction.
@@ -75,9 +74,11 @@ returned by the `co_positions()` iterator.
75
74
> See [ ` Objects/lnotab_notes.txt ` ] ( ../Objects/lnotab_notes.txt ) for more details.
76
75
77
76
` co_linetable ` consists of a sequence of location entries.
78
- Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with the most significant bit unset.
77
+ Each entry starts with a byte with the most significant bit set, followed by
78
+ zero or more bytes with the most significant bit unset.
79
79
80
80
Each entry contains the following information:
81
+
81
82
* The number of code units covered by this entry (length)
82
83
* The start line
83
84
* The end line
@@ -86,54 +87,88 @@ Each entry contains the following information:
86
87
87
88
The first byte has the following format:
88
89
89
- Bit 7 | Bits 3-6 | Bits 0-2
90
- ---- | ---- | ----
91
- 1 | Code | Length (in code units) - 1
90
+ | Bit 7 | Bits 3-6 | Bits 0-2 |
91
+ | ------- | ---------- | ---------------------------- |
92
+ | 1 | Code | Length (in code units) - 1 |
92
93
93
94
The codes are enumerated in the ` _PyCodeLocationInfoKind ` enum.
94
95
95
- ## Variable-length integer encodings
96
+ ### Variable-length integer encodings
96
97
97
- Integers are often encoded using a variable- length integer encoding
98
+ Integers are often encoded using a variable length integer encoding
98
99
99
- ### Unsigned integers (` varint ` )
100
+ #### Unsigned integers (` varint ` )
100
101
101
102
Unsigned integers are encoded in 6-bit chunks, least significant first.
102
103
Each chunk but the last has bit 6 set.
103
104
For example:
104
105
105
106
* 63 is encoded as ` 0x3f `
106
- * 200 is encoded as ` 0x48 ` , ` 0x03 `
107
+ * 200 is encoded as ` 0x48 ` , ` 0x03 ` since `` 200 = (0x03 << 6) | 0x48 `` .
108
+
109
+ The following helper can be used to convert an integer into a ` varint ` :
110
+
111
+ ``` py
112
+ def encode_varint (s ):
113
+ ret = []
114
+ while s >= 64 :
115
+ ret.append(((s & 0x 3F ) | 0x 40 ) & 0x 3F )
116
+ s >>= 6
117
+ ret.append(s & 0x 3F )
118
+ return bytes (ret)
119
+ ```
120
+
121
+ To convert a ` varint ` into an unsigned integer:
122
+
123
+ ``` py
124
+ def decode_varint (chunks ):
125
+ ret = 0
126
+ for chunk in reversed (chunks):
127
+ ret = (ret << 6 ) | chunk
128
+ return ret
129
+ ```
107
130
108
- ### Signed integers (` svarint ` )
131
+ #### Signed integers (` svarint ` )
109
132
110
133
Signed integers are encoded by converting them to unsigned integers, using the following function:
111
- ``` Python
112
- def convert (s ):
134
+
135
+ ``` py
136
+ def svarint_to_varint (s ):
113
137
if s < 0 :
114
- return ((- s)<< 1 ) | 1
138
+ return ((- s) << 1 ) | 1
115
139
else :
116
- return (s<< 1 )
140
+ return s << 1
141
+ ```
142
+
143
+ To convert a ` varint ` into a signed integer:
144
+
145
+ ``` py
146
+ def varint_to_svarint (uval ):
147
+ return - (uval >> 1 ) if uval & 1 else (uval >> 1 )
117
148
```
118
149
119
- * Location entries*
150
+ ### Location entries
120
151
121
152
The meaning of the codes and the following bytes are as follows:
122
153
123
- Code | Meaning | Start line | End line | Start column | End column
124
- ---- | ---- | ---- | ---- | ---- | ----
125
- 0-9 | Short form | Δ 0 | Δ 0 | See below | See below
126
- 10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte
127
- 13 | No column info | Δ svarint | Δ 0 | None | None
128
- 14 | Long form | Δ svarint | Δ varint | varint | varint
129
- 15 | No location | None | None | None | None
154
+ | Code | Meaning | Start line | End line | Start column | End column |
155
+ | ------- | ---------------- | --------------- | ---------- | --------------- | --------------- |
156
+ | 0-9 | Short form | Δ 0 | Δ 0 | See below | See below |
157
+ | 10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte |
158
+ | 13 | No column info | Δ svarint | Δ 0 | None | None |
159
+ | 14 | Long form | Δ svarint | Δ varint | varint | varint |
160
+ | 15 | No location | None | None | None | None |
130
161
131
162
The Δ means the value is encoded as a delta from another value:
163
+
132
164
* Start line: Delta from the previous start line, or ` co_firstlineno ` for the first entry.
133
- * End line: Delta from the start line
165
+ * End line: Delta from the start line.
166
+
167
+ ### The short forms
134
168
135
- * The short forms*
169
+ Codes 0-9 are the short forms. The short form consists of two bytes,
170
+ the second byte holding additional column information. The code is the
171
+ start column divided by 8 (and rounded down).
136
172
137
- Codes 0-9 are the short forms. The short form consists of two bytes, the second byte holding additional column information. The code is the start column divided by 8 (and rounded down).
138
173
* Start column: ` (code*8) + ((second_byte>>4)&7) `
139
174
* End column: ` start_column + (second_byte&15) `
0 commit comments