|
| 1 | +Description of exception handling |
| 2 | +--------------------------------- |
| 3 | + |
| 4 | +Python uses a technique known as "zero-cost" exception handling, which |
| 5 | +minimizes the cost of supporting exceptions. In the common case (where |
| 6 | +no exception is raised) the cost is reduced to zero (or close to zero). |
| 7 | +The cost of raising an exception is increased, but not by much. |
| 8 | + |
| 9 | +The following code: |
| 10 | + |
| 11 | +``` |
| 12 | +try: |
| 13 | + g(0) |
| 14 | +except: |
| 15 | + res = "fail" |
| 16 | +
|
| 17 | +``` |
| 18 | + |
| 19 | +compiles into intermediate code like the following: |
| 20 | + |
| 21 | +``` |
| 22 | + RESUME 0 |
| 23 | +
|
| 24 | + 1 SETUP_FINALLY 8 (to L1) |
| 25 | +
|
| 26 | + 2 LOAD_NAME 0 (g) |
| 27 | + PUSH_NULL |
| 28 | + LOAD_CONST 0 (0) |
| 29 | + CALL 1 |
| 30 | + POP_TOP |
| 31 | + POP_BLOCK |
| 32 | +
|
| 33 | + -- L1: PUSH_EXC_INFO |
| 34 | +
|
| 35 | + 3 POP_TOP |
| 36 | +
|
| 37 | + 4 LOAD_CONST 1 ('fail') |
| 38 | + STORE_NAME 1 (res) |
| 39 | +``` |
| 40 | + |
| 41 | +`SETUP_FINALLY` and `POP_BLOCK` are pseudo-instructions. This means |
| 42 | +that they can appear in intermediate code but they are not bytecode |
| 43 | +instructions. `SETUP_FINALLY` specifies that henceforth, exceptions |
| 44 | +are handled by the code at label L1. The `POP_BLOCK` instruction |
| 45 | +reverses the effect of the last `SETUP` instruction, so that the |
| 46 | +active exception handler reverts to what it was before. |
| 47 | + |
| 48 | +`SETUP_FINALLY` and `POP_BLOCK` have no effect when no exceptions |
| 49 | +are raised. The idea of zero-cost exception handling is to replace |
| 50 | +these pseudo-instructions by metadata which is stored alongside the |
| 51 | +bytecode, and which is inspected only when an exception occurs. |
| 52 | +This metadata is the exception table, and it is stored in the code |
| 53 | +object's `co_exceptiontable` field. |
| 54 | + |
| 55 | +When the pseudo-instructions are translated into bytecode, |
| 56 | +`SETUP_FINALLY` and `POP_BLOCK` are removed, and the exception |
| 57 | +table is constructed, mapping each instruction to the exception |
| 58 | +handler that covers it, if any. Instructions which are not |
| 59 | +covered by any exception handler within the same code object's |
| 60 | +bytecode, do not appear in the exception table at all. |
| 61 | + |
| 62 | +For the code object in our example above, the table has a single |
| 63 | +entry specifying that all instructions that were between the |
| 64 | +`SETUP_FINALLY` and the `POP_BLOCK` are covered by the exception |
| 65 | +handler located at label `L1`. |
| 66 | + |
| 67 | +Handling Exceptions |
| 68 | +------------------- |
| 69 | + |
| 70 | +At runtime, when an exception occurs, the interpreter looks up |
| 71 | +the offset of the current instruction in the exception table. If |
| 72 | +it finds a handler, control flow transfers to it. Otherwise, the |
| 73 | +exception bubbles up to the caller, and the caller's frame is |
| 74 | +checked for a handler covering the `CALL` instruction. This |
| 75 | +repeats until a handler is found or the topmost frame is reached. |
| 76 | +If no handler is found, the program terminates. During unwinding, |
| 77 | +the traceback is constructed as each frame is added to it. |
| 78 | + |
| 79 | +Along with the location of an exception handler, each entry of the |
| 80 | +exception table also contains the stack depth of the `try` instruction |
| 81 | +and a boolean `lasti` value, which indicates whether the instruction |
| 82 | +offset of the raising instruction should be pushed to the stack. |
| 83 | + |
| 84 | +Handling an exception, once an exception table entry is found, consists |
| 85 | +of the following steps: |
| 86 | + |
| 87 | + 1. pop values from the stack until it matches the stack depth for the handler. |
| 88 | + 2. if `lasti` is true, then push the offset that the exception was raised at. |
| 89 | + 3. push the exception to the stack. |
| 90 | + 4. jump to the target offset and resume execution. |
| 91 | + |
| 92 | + |
| 93 | +Reraising Exceptions and `lasti` |
| 94 | +-------------------------------- |
| 95 | + |
| 96 | +The purpose of pushing `lasti` to the stack is for cases where an exception |
| 97 | +needs to be re-raised, and be associated with the original instruction that |
| 98 | +raised it. This happens, for example, at the end of a `finally` block, when |
| 99 | +any in-flight exception needs to be propagated on. As the frame's instruction |
| 100 | +pointer now points into the finally block, a `RERAISE` instruction |
| 101 | +(with `oparg > 0`) sets it to the `lasti` value from the stack. |
| 102 | + |
| 103 | +Format of the exception table |
| 104 | +----------------------------- |
| 105 | + |
| 106 | +Conceptually, the exception table consists of a sequence of 5-tuples: |
| 107 | +``` |
| 108 | + 1. `start-offset` (inclusive) |
| 109 | + 2. `end-offset` (exclusive) |
| 110 | + 3. `target` |
| 111 | + 4. `stack-depth` |
| 112 | + 5. `push-lasti` (boolean) |
| 113 | +``` |
| 114 | + |
| 115 | +All offsets and lengths are in code units, not bytes. |
| 116 | + |
| 117 | +We want the format to be compact, but quickly searchable. |
| 118 | +For it to be compact, it needs to have variable sized entries so that we can store common (small) offsets compactly, but handle large offsets if needed. |
| 119 | +For it to be searchable quickly, we need to support binary search giving us log(n) performance in all cases. |
| 120 | +Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry. |
| 121 | + |
| 122 | +It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as: |
| 123 | + `start, size, target, depth, push-lasti`. |
| 124 | + |
| 125 | +Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes. |
| 126 | +It also happens that depth is generally quite small. |
| 127 | + |
| 128 | +So, we need to encode: |
| 129 | +``` |
| 130 | + `start` (up to 30 bits) |
| 131 | + `size` (up to 30 bits) |
| 132 | + `target` (up to 30 bits) |
| 133 | + `depth` (up to ~8 bits) |
| 134 | + `lasti` (1 bit) |
| 135 | +``` |
| 136 | + |
| 137 | +We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set. |
| 138 | +Since the most significant bit is reserved for marking the start of an entry, we have 7 bits per byte to encode offsets. |
| 139 | +Encoding uses a standard varint encoding, but with only 7 bits instead of the usual 8. |
| 140 | +The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the extend bit meaning that the next byte is required to extend the offset. |
| 141 | + |
| 142 | +In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding. |
| 143 | + |
| 144 | +For example, the exception entry: |
| 145 | +``` |
| 146 | + `start`: 20 |
| 147 | + `end`: 28 |
| 148 | + `target`: 100 |
| 149 | + `depth`: 3 |
| 150 | + `lasti`: False |
| 151 | +``` |
| 152 | + |
| 153 | +is encoded by first converting to the more compact four value form: |
| 154 | +``` |
| 155 | + `start`: 20 |
| 156 | + `size`: 8 |
| 157 | + `target`: 100 |
| 158 | + `depth<<1+lasti`: 6 |
| 159 | +``` |
| 160 | + |
| 161 | +which is then encoded as: |
| 162 | +``` |
| 163 | + 148 (MSB + 20 for start) |
| 164 | + 8 (size) |
| 165 | + 65 (Extend bit + 1) |
| 166 | + 36 (Remainder of target, 100 == (1<<6)+36) |
| 167 | + 6 |
| 168 | +``` |
| 169 | + |
| 170 | +for a total of five bytes. |
| 171 | + |
| 172 | + |
| 173 | +Script to parse the exception table |
| 174 | +----------------------------------- |
| 175 | + |
| 176 | +``` |
| 177 | +def parse_varint(iterator): |
| 178 | + b = next(iterator) |
| 179 | + val = b & 63 |
| 180 | + while b&64: |
| 181 | + val <<= 6 |
| 182 | + b = next(iterator) |
| 183 | + val |= b&63 |
| 184 | + return val |
| 185 | +``` |
| 186 | +``` |
| 187 | +def parse_exception_table(code): |
| 188 | + iterator = iter(code.co_exceptiontable) |
| 189 | + try: |
| 190 | + while True: |
| 191 | + start = parse_varint(iterator)*2 |
| 192 | + length = parse_varint(iterator)*2 |
| 193 | + end = start + length - 2 # Present as inclusive, not exclusive |
| 194 | + target = parse_varint(iterator)*2 |
| 195 | + dl = parse_varint(iterator) |
| 196 | + depth = dl >> 1 |
| 197 | + lasti = bool(dl&1) |
| 198 | + yield start, end, target, depth, lasti |
| 199 | + except StopIteration: |
| 200 | + return |
| 201 | +``` |
0 commit comments