Rollup merge of rust-lang#40807 - stjepang:optimize-insertion-sort, r=alexcrichton

frewsxcv · web-flow · commit 2bdbcb061806 · 2017-03-25T09:30:32.000-07:00
Optimize insertion sort

This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.

Benchmark:

```
name                                   before ns/iter   after ns/iter    diff ns/iter   diff %
slice::sort_unstable_small_ascending   39 (2051 MB/s)   38 (2105 MB/s)             -1   -2.56%
slice::sort_unstable_small_big_random  579 (2210 MB/s)  575 (2226 MB/s)            -4   -0.69%
slice::sort_unstable_small_descending  80 (1000 MB/s)   70 (1142 MB/s)            -10  -12.50%
slice::sort_unstable_small_random      396 (202 MB/s)   386                       -10   -2.53%
```

The benchmark is not a fluke. I can see that performance on `small_descending` is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that `v.len()+1` to the compiler looks like it could in theory overflow.
diff --git a/src/libcore/slice/sort.rs b/src/libcore/slice/sort.rs
@@ -152,8 +152,8 @@ fn partial_insertion_sort<T, F>(v: &mut [T], is_less: &mut F) -> bool
 fn insertion_sort<T, F>(v: &mut [T], is_less: &mut F)
     where F: FnMut(&T, &T) -> bool
 {
-    for i in 2..v.len()+1 {
-        shift_tail(&mut v[..i], is_less);
+    for i in 1..v.len() {
+        shift_tail(&mut v[..i+1], is_less);
     }
 }
 

Original file line number	Diff line number	Diff line change
`@@ -152,8 +152,8 @@ fn partial_insertion_sort<T, F>(v: &mut [T], is_less: &mut F) -> bool`
`152`	`152`	`fn insertion_sort<T, F>(v: &mut [T], is_less: &mut F)`
`153`	`153`	`where F: FnMut(&T, &T) -> bool`
`154`	`154`	`{`
`155`		`- for i in 2..v.len()+1 {`
`156`		`- shift_tail(&mut v[..i], is_less);`
	`155`	`+ for i in 1..v.len() {`
	`156`	`+ shift_tail(&mut v[..i+1], is_less);`
`157`	`157`	`}`
`158`	`158`	`}`
`159`	`159`