|
| 1 | +List of Bugs uncovered in Rust via arithmetic overflow checking |
| 2 | +=============================================================== |
| 3 | +This document is a list of bugs that were uncovered during the |
| 4 | +implementation and deployment of arithmetic overflow checking. |
| 5 | +This list is restricted solely to *legitimate* bugs. Cases |
| 6 | +where the overflow was benign (e.g. the computed value is |
| 7 | +unused), transient (e.g. the computed wrapped value is |
| 8 | +guaranteed to be brought back into the original range, such as |
| 9 | +in `unsigned - 1 + provably_ tpositive`), or silly (random |
| 10 | +non-functional code in the tests or documentation) are not |
| 11 | +included in the list. |
| 12 | +However, extremely rare or obscure corner cases are considered |
| 13 | +legitimate bugs. (We begin with such a case.) |
| 14 | + |
| 15 | + 1. `impl core::iter::RandomAccessIter for core::iter::Rev` |
| 16 | + |
| 17 | + if one calls the `iter.idx(index)` with `index <= amt`, |
| 18 | + then it calls the wrapped inner iterstor with a wrapped |
| 19 | + around value. The contract for `idx` does say that it |
| 20 | + does need to handle out-of-bounds inputs, so this |
| 21 | + appeared benign at first, but there is the corner case |
| 22 | + of an iterator that actually covers the whole range |
| 23 | + of indices, which would then return Some(_) here when |
| 24 | + (pnkfelix thinks) None should be expected. |
| 25 | + |
| 26 | + reference: |
| 27 | + https://github.com/rust-lang/rust/pull/22532#issuecomment-75168901 |
| 28 | + |
| 29 | + 2. `std::sys::windows::time::SteadyTime` |
| 30 | + |
| 31 | + `fn ns` was converting a tick count `t` to nanoseconds |
| 32 | + via the computation `t * 1_000_000_000 / frequency()`; |
| 33 | + but the multiplication there can overflow, thus losing |
| 34 | + the high-order bits. |
| 35 | + |
| 36 | + Full disclosure: This bug was known prior to landing |
| 37 | + arithmetic overflow checks, and filed as: |
| 38 | + |
| 39 | + https://github.com/rust-lang/rust/issues/17845 |
| 40 | + |
| 41 | + Despite being filed, it was left unfixed for months, |
| 42 | + despite the fact that the overflow would start |
| 43 | + occurring after 2 hours of machine uptime, according to: |
| 44 | + |
| 45 | + https://github.com/rust-lang/rust/pull/22788 |
| 46 | + |
| 47 | + pnkfelix included it on this list because having arithmetic |
| 48 | + overflow forces such bugs to be fixed in some manner |
| 49 | + rather than ignored. |
| 50 | + |
| 51 | + 3. `std::rt::lang_start` |
| 52 | + The runtime startup uses a fairly loose computation to |
| 53 | + determine the stack extent to pass to |
| 54 | + record_os_managed_stack_bounds (which sets up guard |
| 55 | + pages and fault handlers to deal with call stack over- |
| 56 | + or underflows). |
| 57 | + |
| 58 | + In this case, the arithmetic involved was actually |
| 59 | + *overflowing*, in this calculation: |
| 60 | + |
| 61 | + ``` |
| 62 | + let top_plus_20k = my_stack_top + 20000; |
| 63 | + ``` |
| 64 | +
|
| 65 | + pnkfelix assumes that in practice this would lead to us |
| 66 | + attempting to install a guard page starting from some |
| 67 | + random location, rather than the actual desired |
| 68 | + address range. While the lack of a guard page in the |
| 69 | + right spot is probably of no consequence here (assuming |
| 70 | + that the OS is already going to stop us from actually |
| 71 | + attempting to write to stack locations resulting from |
| 72 | + overflow if that ever occurs), attempting to install a |
| 73 | + guard page on a random unrelated address range seems |
| 74 | + completely bogus. |
| 75 | + pnkfelix only observed this bug when building a 32-bit |
| 76 | + Rust on a 64-bit Linux host via cross-compilation. |
| 77 | +
|
| 78 | + So, probably qualifies a rare bug. |
| 79 | + reference: |
| 80 | +
|
| 81 | + https://github.com/rust-lang/rust/pull/22532#issuecomment-76927295 |
| 82 | +
|
| 83 | + UPDATE: In hindsight, one might argue this should be |
| 84 | + reclassified as a transient overflow, because the whole |
| 85 | + computation in context is: |
| 86 | +
|
| 87 | + ``` |
| 88 | + let my_stack_bottom = |
| 89 | + my_stack_top + 20000 - OS_DEFAULT_STACK_ESTIMATE; |
| 90 | + ``` |
| 91 | +
|
| 92 | + where OS_DEFAULT_STACK_ESTIMATE is a large value |
| 93 | + (> 1mb). |
| 94 | +
|
| 95 | + However, my claim is that this code is playing guessing |
| 96 | + games; do we really know that the stack is sufficiently |
| 97 | + large that the computation above does not *underflow*? |
| 98 | +
|
| 99 | + So pnkfelix is going to leave it on this list, at least |
| 100 | + for now. (pnkfelix subsequently changed the code to use |
| 101 | + saturated arithmetic in both cases, though obviously |
| 102 | + that could be tweaked a bit.) |
| 103 | + 4. struct order of evaluation |
| 104 | +
|
| 105 | + There is an explanatory story here: |
| 106 | +
|
| 107 | + https://github.com/rust-lang/rust/issues/23112 |
| 108 | +
|
| 109 | + In short, one of our tests was quite weak and not |
| 110 | + actually checking the computed values. But |
| 111 | + arithmetic-overflow checking immediately pointed |
| 112 | + out an attempt to reserve a ridiculous amount |
| 113 | + of space within a `Vec`. (This was on an experimental |
| 114 | + branch of the codebase where we would fill with |
| 115 | + a series of 0xC1 bytes when a value was dropped, rather |
| 116 | + than filling with 0x00 bytes.) |
| 117 | +
|
| 118 | + It is actually quite likely that this test would still |
| 119 | + have failed without the arithmetic overflow checking, |
| 120 | + but it probably would have been much harder to diagnose |
| 121 | + since the panic would have happened at some arbitrary |
| 122 | + point later in the control flow. |
0 commit comments