Skip to content

Commit 2ef770d

Browse files
committed
[LV] Change loops' interleave count computation
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#26), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
1 parent bdcbd52 commit 2ef770d

File tree

2 files changed

+17
-8
lines changed

2 files changed

+17
-8
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5745,8 +5745,12 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
57455745
}
57465746

57475747
// If trip count is known or estimated compile time constant, limit the
5748-
// interleave count to be less than the trip count divided by VF, provided it
5749-
// is at least 1.
5748+
// interleave count to be less than the trip count divided by VF * 2,
5749+
// provided VF is at least 1 and the trip count is not an exact multiple of
5750+
// VF, such that the vector loop runs at least twice to make interleaving seem
5751+
// profitable when there is an epilogue loop present. When
5752+
// InterleaveSmallLoopScalarReduction is true or trip count is an exact
5753+
// multiple of VF, we allow interleaving even when the vector loop runs once.
57505754
//
57515755
// For scalable vectors we can't know if interleaving is beneficial. It may
57525756
// not be beneficial for small loops if none of the lanes in the second vector
@@ -5755,10 +5759,15 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
57555759
// the InterleaveCount as if vscale is '1', although if some information about
57565760
// the vector is known (e.g. min vector size), we can make a better decision.
57575761
if (BestKnownTC) {
5758-
MaxInterleaveCount =
5759-
std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
5760-
// Make sure MaxInterleaveCount is greater than 0.
5761-
MaxInterleaveCount = std::max(1u, MaxInterleaveCount);
5762+
if (InterleaveSmallLoopScalarReduction ||
5763+
(*BestKnownTC % VF.getKnownMinValue() == 0))
5764+
MaxInterleaveCount =
5765+
std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
5766+
else
5767+
MaxInterleaveCount = std::min(*BestKnownTC / (VF.getKnownMinValue() * 2),
5768+
MaxInterleaveCount);
5769+
// Make sure MaxInterleaveCount is greater than 0 & a power of 2.
5770+
MaxInterleaveCount = llvm::bit_floor(std::max(1u, MaxInterleaveCount));
57625771
}
57635772

57645773
assert(MaxInterleaveCount > 0 &&

llvm/test/Transforms/LoopVectorize/AArch64/interleave_count.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ for.end:
3030

3131
; For a loop with known trip count of 129, when we force VF 64, it should use
3232
; IC 1, since there may be a remainder loop that needs to run after the vector loop.
33-
; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 2)
33+
; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 1)
3434
define void @loop_with_tc_129(ptr %p, ptr %q) {
3535
entry:
3636
br label %for.body
@@ -80,7 +80,7 @@ for.end:
8080
; For a loop with unknown trip count but a profile showing an approx TC estimate of 129,
8181
; when we force VF 64, it should use IC 1, since chances are high that the remainder loop
8282
; will need to run
83-
; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 2)
83+
; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 1)
8484
define void @loop_with_profile_tc_129(ptr %p, ptr %q, i64 %n) {
8585
entry:
8686
br label %for.body

0 commit comments

Comments
 (0)