Heuristic improvement: reg-scan offset by inst location.

cfallin · cfallin · commit 33ac6cb41d9b · 2021-04-13T23:31:34.000-07:00
We currently use a heuristic that our scan for an available PReg
starts at an index into the register list that rotates with the bundle
index. This is a simple way to distribute contention across the whole
register file more evenly and avoid repeating less-likely-to-succeed
reg-map probes to lower-numbered registers for every bundle.

After some experimentation with different options (queue that
dynamically puts registers at end after allocating, various
ways of mixing/hashing indices, etc.), adding the *instruction offset*
(of the start of the first range in the bundle) as well gave the best
results. This is very simple and gives us a likely better-than-random
conflict avoidance because ranges tend to be local, so rotating
through registers as we scan down the list of instructions seems like
a very natural strategy.

On the tests used by our `cargo bench` benchmark, this reduces regfile
probes for the largest (459 instruction) benchmark from 1538 to 829,
i.e., approximately by half, and results in an 11% allocation speedup.
diff --git a/README.md b/README.md
@@ -111,27 +111,21 @@ benches/0               time:   [365.68 us 367.36 us 369.04 us]
 ```
 
 I then measured three different fuzztest-SSA-generator test cases in
-this allocator, `regalloc2`, measuring between 1.05M and 2.3M
+this allocator, `regalloc2`, measuring between 1.1M and 2.3M
 instructions per second (closer to the former for larger functions):
 
 ```plain
 ==== 459 instructions
-benches/0               time:   [424.46 us 425.65 us 426.59 us]
-                        thrpt:  [1.0760 Melem/s 1.0784 Melem/s 1.0814 Melem/s]
+benches/0               time:   [377.91 us 378.09 us 378.27 us]
+                        thrpt:  [1.2134 Melem/s 1.2140 Melem/s 1.2146 Melem/s]
 
 ==== 225 instructions
-benches/1               time:   [213.05 us 213.28 us 213.54 us]
-                        thrpt:  [1.0537 Melem/s 1.0549 Melem/s 1.0561 Melem/s]
+benches/1               time:   [202.03 us 202.14 us 202.27 us]
+                        thrpt:  [1.1124 Melem/s 1.1131 Melem/s 1.1137 Melem/s]
 
-Found 1 outliers among 100 measurements (1.00%)
-  1 (1.00%) high mild
 ==== 21 instructions
-benches/2               time:   [9.0495 us 9.0571 us 9.0641 us]
-                        thrpt:  [2.3168 Melem/s 2.3186 Melem/s 2.3206 Melem/s]
-
-Found 4 outliers among 100 measurements (4.00%)
-  2 (2.00%) high mild
-  2 (2.00%) high severe
+benches/2               time:   [9.5605 us 9.5655 us 9.5702 us]
+                        thrpt:  [2.1943 Melem/s 2.1954 Melem/s 2.1965 Melem/s]
 ```
 
 Though not apples-to-apples (SSA vs. non-SSA, completely different
diff --git a/src/ion/mod.rs b/src/ion/mod.rs
@@ -2570,6 +2570,17 @@ impl<'a, F: Function> Env<'a, F> {
                     } else {
                         n_regs
                     };
+                    // Heuristic: start the scan for an available
+                    // register at an offset influenced both by our
+                    // location in the code and by the bundle we're
+                    // considering. This has the effect of spreading
+                    // demand more evenly across registers.
+                    let scan_offset = self.ranges[self.bundles[bundle.index()].first_range.index()]
+                        .range
+                        .from
+                        .inst
+                        .index()
+                        + bundle.index();
                     for i in 0..loop_count {
                         // The order in which we try registers is somewhat complex:
                         // - First, if there is a hint, we try that.
@@ -2587,15 +2598,15 @@ impl<'a, F: Function> Env<'a, F> {
                             (0, Some(hint_reg)) => hint_reg,
                             (i, Some(hint_reg)) => {
                                 let reg = self.env.regs_by_class[class as u8 as usize]
-                                    [(i - 1 + bundle.index()) % n_regs];
+                                    [(i - 1 + scan_offset) % n_regs];
                                 if reg == hint_reg {
                                     continue;
                                 }
                                 reg
                             }
                             (i, None) => {
                                 self.env.regs_by_class[class as u8 as usize]
-                                    [(i + bundle.index()) % n_regs]
+                                    [(i + scan_offset) % n_regs]
                             }
                         };