Skip to content

Incorrect iteration count for loop with unsigned ICMP condition #2375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
llvmbot opened this issue Feb 10, 2008 · 13 comments
Closed

Incorrect iteration count for loop with unsigned ICMP condition #2375

llvmbot opened this issue Feb 10, 2008 · 13 comments
Labels
bugzilla Issues migrated from bugzilla miscompilation

Comments

@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2008

Bugzilla Link 2003
Resolution FIXED
Resolved on Feb 20, 2008 03:13
Version trunk
OS All
Reporter LLVM Bugzilla Contributor
CC @nlewycky

Extended Description

For the following LLVM code:

define void @​foo(i32 %n) {
entry:
br label %header
header:
%i = phi i32 [ 100, %entry ], [ %i.inc, %next ]
%cond = icmp ult i32 %i, %n
br i1 %cond, label %next, label %return
next:
%i.inc = add i32 %i, 1
br label %header
return:
ret void
}

which contains loop of this form:

unsigned n = ...;
for (unsigned i = 100; i < n; ++i)
;

scalar evolution determines loop iteration count as: (-100 + %n).
This isn't correct, because for %n < 100 we'll get negative number of iterations.

One way to fix it is to add a 'umax' SCEV similar to the 'smax' one. Using it, the answer for the example would be: (100 umax %n) - 100.

Any other ideas?

@nlewycky
Copy link
Contributor

If %n = 0, then we get %n - 100 = -100 = +156 (unsigned). And this is the correct loop trip count; if you start at 100, it'll take you 156 (256-100) iterations to get back around to zero.

Reopen if you find a counter-example.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2008

If %n = 0, then we get %n - 100 = -100 = +156 (unsigned). And this is the
correct loop trip count; if you start at 100, it'll take you 156 (256-100)
iterations to get back around to zero.

Please, note that the comparison operator is ULT and not NE. What you are saying is true in the second case. Suppose %n = 0. Then the first execution of:
%cond = icmp ult i32 %i, %n
defines %cond as false and causes the loop to end. So, the trip count should be 0 in this case.

@nlewycky
Copy link
Contributor

Right again. :) Confirmed, it's a bug (experimentally this time).

I don't see any way to fix it without introducing a UMAX, which I was really hoping to avoid...

@llvmbot
Copy link
Member Author

llvmbot commented Feb 16, 2008

create umax, just like smax
This is the obvious fix, just adds UMax and changes the trip count code to use it.

There should be some additional optimizations that we can do with umax that we couldn't with smax, but I haven't implemented any.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 16, 2008

The attached patch miscompiles McCat/18-imp. Investigating...

@llvmbot
Copy link
Member Author

llvmbot commented Feb 16, 2008

proposed fix
The bug was in converting select/icmp to smax or umax.

While fixing it, I noticed that we were being suboptimal and not detecting patterns that could've been converted into smax. So I fixed that and added the equivalent umax case.

Passes nightly test.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 17, 2008

Nick,

Thanks a lot for tackling this. I have few comments about the patch.

Index: include/llvm/Analysis/ScalarEvolutionExpressions.h

 scConstant, scTruncate, scZeroExtend, scSignExtend, scAddExpr, scMulExpr,
  • scUDivExpr, scAddRecExpr, scSMaxExpr, scUnknown, scCouldNotCompute
  • scUDivExpr, scAddRecExpr, scUMaxExpr, scSMaxExpr, scUnknown,
  • scCouldNotCompute

Cosmetic proposition: scUMaxExpr could be after scSMaxExpr to reflect the order we handle these expressions everywhere in the code.

Index: lib/Analysis/ScalarEvolutionExpander.cpp

+Value *SCEVExpander::visitUMaxExpr(SCEVUMaxExpr *S) {

  • Value *LHS = expand(S->getOperand(0));
  • for (unsigned i = 1; i < S->getNumOperands(); ++i) {
  • Value *RHS = expand(S->getOperand(i));
  • Value *ICmp = new ICmpInst(ICmpInst::ICMP_UGT, LHS, RHS, "tmp", InsertPt);
  • LHS = new SelectInst(ICmp, LHS, RHS, "umax", InsertPt);
  • }
  • return LHS;
    +}

The code for visitUMaxExpr is almost the same as for visitSMaxExpr. Have you considered factoring the common part out? However, the current approach is okay for me as there functions are very short.

Index: lib/Analysis/ScalarEvolution.cpp

+SCEVHandle ScalarEvolution::getUMaxExpr(std::vector Ops) {

  • assert(!Ops.empty() && "Cannot get empty umax!");
  • if (Ops.size() == 1) return Ops[0];
  • // Sort by complexity, this groups all similar expression types together.
  • GroupByComplexity(Ops);
  • // If there are any constants, fold them together.
  • unsigned Idx = 0;
  • if (SCEVConstant *LHSC = dyn_cast(Ops[0])) {
  • ++Idx;
  • assert(Idx < Ops.size());
  • while (SCEVConstant *RHSC = dyn_cast(Ops[Idx])) {
  •  // We found two constants, fold them together!
    
  •  Constant *Fold = ConstantInt::get(
    
  •                          APIntOps::umax(LHSC->getValue()->getValue(),
    
  •                                         RHSC->getValue()->getValue()));
    
  •  if (ConstantInt *CI = dyn_cast<ConstantInt>(Fold)) {
    
  •    Ops[0] = getConstant(CI);
    
  •    Ops.erase(Ops.begin()+1);  // Erase the folded element
    
  •    if (Ops.size() == 1) return Ops[0];
    
  •    LHSC = cast<SCEVConstant>(Ops[0]);
    
  •  } else {
    
  •    // If we couldn't fold the expression, move to the next constant.  Note
    
  •    // that this is impossible to happen in practice because we always
    
  •    // constant fold constant ints to constant ints.
    
  •    ++Idx;
    
  •  }
    
  • }

I don't think the else branch here is really needed. ConstantInt::get() returns a ConstantInt object so there is no possibility it will be ever executed. This probably also applies to other get*Expr() methods, but we may remove all these branches after this bug is fixed.
+

  • // If we are left with a constant -inf, strip it off.
  • if (cast(Ops[0])->getValue()->isMinValue(true)) {
  •  Ops.erase(Ops.begin());
    
  •  --Idx;
    
  • }
  • }

isMinValue(false) should be used there. Also, the -inf should be substituted with 0 in the comment.

  • if (Ops.size() == 1) return Ops[0];
  • // Find the first UMax
  • while (Idx < Ops.size() && Ops[Idx]->getSCEVType() < scUMaxExpr)
  • ++Idx;
  • // Check to see if one of the operands is an UMax. If so, expand its operands
  • // onto our operand list, and recurse to simplify.
  • if (Idx < Ops.size()) {
  • bool DeletedUMax = false;
  • while (SCEVUMaxExpr *UMax = dyn_cast(Ops[Idx])) {
  •  Ops.insert(Ops.end(), UMax->op_begin(), UMax->op_end());
    
  •  Ops.erase(Ops.begin()+Idx);
    
  •  DeletedUMax = true;
    
  • }
  • if (DeletedUMax)
  •  return getUMaxExpr(Ops);
    
  • }
  • // Okay, check to see if the same value occurs in the operand list twice. If
  • // so, delete one. Since we sorted the list, these values are required to
  • // be adjacent.
  • for (unsigned i = 0, e = Ops.size()-1; i != e; ++i)
  • if (Ops[i] == Ops[i+1]) { // X umax Y umax Y --> X umax Y
  •  Ops.erase(Ops.begin()+i, Ops.begin()+i+1);
    
  •  --i; --e;
    
  • }
  • if (Ops.size() == 1) return Ops[0];
  • assert(!Ops.empty() && "Reduced umax down to nothing!");
  • // Okay, it looks like we really DO need an add expr. Check to see if we

add -> umax

@@ -1628,6 +1725,8 @@
case Instruction::UDiv:
return SE.getUDivExpr(getSCEV(I->getOperand(0)),
getSCEV(I->getOperand(1)));

  •  break;
    

Is there any reason for the break to exist after this case and not after the others?

@@ -1699,6 +1799,28 @@
case ICmpInst::ICMP_SGE:
if (LHS == I->getOperand(1) && RHS == I->getOperand(2))
return SE.getSMaxExpr(getSCEV(LHS), getSCEV(RHS));

  •      else if (LHS == I->getOperand(2) && RHS == I->getOperand(1))
    
  •        // -smax(-x, -y) == smin(x, y).
    
  •        return SE.getNegativeSCEV(SE.getSMaxExpr(
    
  •                                      SE.getNegativeSCEV(getSCEV(LHS)),
    
  •                                      SE.getNegativeSCEV(getSCEV(RHS))));
    
  •      break;
    
  •    case ICmpInst::ICMP_ULT:
    
  •    case ICmpInst::ICMP_ULE:
    
  •      std::swap(LHS, RHS);
    
  •      // fall through
    
  •    case ICmpInst::ICMP_UGT:
    
  •    case ICmpInst::ICMP_UGE:
    
  •      if (LHS == I->getOperand(1) && RHS == I->getOperand(2))
    
  •        return SE.getUMaxExpr(getSCEV(LHS), getSCEV(RHS));
    
  •      else if (LHS == I->getOperand(2) && RHS == I->getOperand(1)) {
    
  •        // ~umax(~x, ~y) == umin(x, y)
    
  •        // allones - x == ~x
    
  •        SCEVHandle AllOnes = SE.getIntegerSCEV(-1ULL, I->getType());
    

Note that first arg of getIntegerSCEV is int, not long long. I think the safe way would be to use ConstantInt::getAllOnesValue() and then ScalarEvolution::getConstant().

@@ -2541,7 +2665,8 @@
// Then, we get the value of the LHS in the first iteration in which the
// above condition doesn't hold. This equals to max(m,n).
// FIXME (#2003 ): we should have an "umax" operator as well.

  • SCEVHandle End = isSigned ? SE.getSMaxExpr(RHS,Start) : (SCEVHandle)RHS;
  • SCEVHandle End = isSigned ? SE.getSMaxExpr(RHS, Start)
  •                          : SE.getUMaxExpr(RHS, Start);
    

The FIXME can be removed now.

And the last question. Have you considered using a common superclass for SMax and UMax? I'm not sure if it changed much - just asking.

Wojtek

@nlewycky
Copy link
Contributor

patch with cleanup
Wojtek, thank you for your excellent code review!

Index: include/llvm/Analysis/ScalarEvolutionExpressions.h

 scConstant, scTruncate, scZeroExtend, scSignExtend, scAddExpr, scMulExpr,
  • scUDivExpr, scAddRecExpr, scSMaxExpr, scUnknown, scCouldNotCompute
  • scUDivExpr, scAddRecExpr, scUMaxExpr, scSMaxExpr, scUnknown,
  • scCouldNotCompute

Cosmetic proposition: scUMaxExpr could be after scSMaxExpr to reflect the order
we handle these expressions everywhere in the code.

The order in this enum isn't cosmetic, it's the order that GroupByComplexity will use when grouping them. Basically it means that we'd rather have a umax than an smax, given the choice.

That said, it wouldn't lead to any different codegen today because there's no optimizations that peek through umax or smax to, for example, pull an add instruction through the umax. But it'd be easier to do that for a umax, so I put it first.

Index: lib/Analysis/ScalarEvolutionExpander.cpp

+Value *SCEVExpander::visitUMaxExpr(SCEVUMaxExpr *S) {

  • Value *LHS = expand(S->getOperand(0));
  • for (unsigned i = 1; i < S->getNumOperands(); ++i) {
  • Value *RHS = expand(S->getOperand(i));
  • Value *ICmp = new ICmpInst(ICmpInst::ICMP_UGT, LHS, RHS, "tmp", InsertPt);
  • LHS = new SelectInst(ICmp, LHS, RHS, "umax", InsertPt);
  • }
  • return LHS;
    +}

The code for visitUMaxExpr is almost the same as for visitSMaxExpr. Have you
considered factoring the common part out? However, the current approach is okay
for me as there functions are very short.

I don't think it's worth it to refactor them at this point. It'd basically be a mess of if statements. I am guilty as charged of cut'n'pasting smax code though. :)

Index: lib/Analysis/ScalarEvolution.cpp

+SCEVHandle ScalarEvolution::getUMaxExpr(std::vector Ops) {

  • assert(!Ops.empty() && "Cannot get empty umax!");
  • if (Ops.size() == 1) return Ops[0];
  • // Sort by complexity, this groups all similar expression types together.
  • GroupByComplexity(Ops);
  • // If there are any constants, fold them together.
  • unsigned Idx = 0;
  • if (SCEVConstant *LHSC = dyn_cast(Ops[0])) {
  • ++Idx;
  • assert(Idx < Ops.size());
  • while (SCEVConstant *RHSC = dyn_cast(Ops[Idx])) {
  •  // We found two constants, fold them together!
    
  •  Constant *Fold = ConstantInt::get(
    
  •                          APIntOps::umax(LHSC->getValue()->getValue(),
    
  •                                         RHSC->getValue()->getValue()));
    
  •  if (ConstantInt *CI = dyn_cast<ConstantInt>(Fold)) {
    
  •    Ops[0] = getConstant(CI);
    
  •    Ops.erase(Ops.begin()+1);  // Erase the folded element
    
  •    if (Ops.size() == 1) return Ops[0];
    
  •    LHSC = cast<SCEVConstant>(Ops[0]);
    
  •  } else {
    
  •    // If we couldn't fold the expression, move to the next constant. 
    

Note

  •    // that this is impossible to happen in practice because we always
    
  •    // constant fold constant ints to constant ints.
    
  •    ++Idx;
    
  •  }
    
  • }

I don't think the else branch here is really needed. ConstantInt::get() returns
a ConstantInt object so there is no possibility it will be ever executed. This
probably also applies to other get*Expr() methods, but we may remove all these
branches after this bug is fixed.

You are so totally correct. I've gone ahead and removed the equivalent branches from all get*Expr methods.

  • // If we are left with a constant -inf, strip it off.
  • if (cast(Ops[0])->getValue()->isMinValue(true)) {
  •  Ops.erase(Ops.begin());
    
  •  --Idx;
    
  • }
  • }

isMinValue(false) should be used there. Also, the -inf should be substituted
with 0 in the comment.

Good catch. Done.

  • // Okay, it looks like we really DO need an add expr. Check to see if we

add -> umax

Done. Same thing for the smax function, too.

@@ -1628,6 +1725,8 @@
case Instruction::UDiv:
return SE.getUDivExpr(getSCEV(I->getOperand(0)),
getSCEV(I->getOperand(1)));

  •  break;
    

Is there any reason for the break to exist after this case and not after the
others?

Nope, deleted.

@@ -1699,6 +1799,28 @@
case ICmpInst::ICMP_SGE:
if (LHS == I->getOperand(1) && RHS == I->getOperand(2))
return SE.getSMaxExpr(getSCEV(LHS), getSCEV(RHS));

  •      else if (LHS == I->getOperand(2) && RHS == I->getOperand(1))
    
  •        // -smax(-x, -y) == smin(x, y).
    
  •        return SE.getNegativeSCEV(SE.getSMaxExpr(
    
  •                                      SE.getNegativeSCEV(getSCEV(LHS)),
    
  •                                      SE.getNegativeSCEV(getSCEV(RHS))));
    
  •      break;
    
  •    case ICmpInst::ICMP_ULT:
    
  •    case ICmpInst::ICMP_ULE:
    
  •      std::swap(LHS, RHS);
    
  •      // fall through
    
  •    case ICmpInst::ICMP_UGT:
    
  •    case ICmpInst::ICMP_UGE:
    
  •      if (LHS == I->getOperand(1) && RHS == I->getOperand(2))
    
  •        return SE.getUMaxExpr(getSCEV(LHS), getSCEV(RHS));
    
  •      else if (LHS == I->getOperand(2) && RHS == I->getOperand(1)) {
    
  •        // ~umax(~x, ~y) == umin(x, y)
    
  •        // allones - x == ~x
    
  •        SCEVHandle AllOnes = SE.getIntegerSCEV(-1ULL, I->getType());
    

Note that first arg of getIntegerSCEV is int, not long long. I think the safe
way would be to use ConstantInt::getAllOnesValue() and then
ScalarEvolution::getConstant().

I've refactored this out into a getNotSCEV method much like getNegativeSCEV. Using -1ULL is quite amusing (uh, negative unsigned?) but plain -1 should be fine and is already used elsewhere in the code.

I also wired up "xor %x, -1" to getNotSCEV (which is in turn -1 - %x). We'll see what impact that has.

@@ -2541,7 +2665,8 @@

 // Then, we get the value of the LHS in the first iteration in which the
 // above condition doesn't hold.  This equals to max(m,n).
 // FIXME (#2003 ): we should have an "umax" operator as well.
  • SCEVHandle End = isSigned ? SE.getSMaxExpr(RHS,Start) : (SCEVHandle)RHS;
  • SCEVHandle End = isSigned ? SE.getSMaxExpr(RHS, Start)
  •                          : SE.getUMaxExpr(RHS, Start);
    

The FIXME can be removed now.

Done.

And the last question. Have you considered using a common superclass for SMax
and UMax? I'm not sure if it changed much - just asking.

Yes -- UMax is pretty much a direct copy of SMax.

The other option was making a single SCEVMax with a signedness flag. It doesn't much matter, but I opted for this approach because this way we have commutative SCEVs. If we combined them, umax and smax couldn't commute with one another. Also, I think we could optimize umax better than smax, so it helps to have them split apart.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 19, 2008

Uh ... I posted a slightly old version of the patch. This hunk:

@@ -1653,6 +1731,8 @@
if (CI->getValue().isSignBit())
return SE.getAddExpr(getSCEV(I->getOperand(0)),
getSCEV(I->getOperand(1)));

  •    else if (CI->getValue().isAllOnes())
    
  •      return SE.getNotSCEV(getSCEV(I->getOperand(0)));
     }
     break;
    

Should be "CI->isAllOnesValue()". Sorry.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 19, 2008

Nice corrections! Just one issue. Please, read below.

The order in this enum isn't cosmetic, it's the order that GroupByComplexity
will use when grouping them. Basically it means that we'd rather have a umax
than an smax, given the choice.

Okay.

I don't think it's worth it to refactor them at this point. It'd basically be a
mess of if statements. I am guilty as charged of cut'n'pasting smax code
though. :)

Seems fine to me.

I've refactored this out into a getNotSCEV method much like getNegativeSCEV.
Using -1ULL is quite amusing (uh, negative unsigned?) but plain -1 should be
fine and is already used elsewhere in the code.

Okay, but the body of getNotSCEV probably needs some fixing. Please note that the Val argument of getIntegerSCEV() is an int. getIntegerSCEV() calls ConstantInt::get() passing Val as the second argument which should be of uint64_t type. This way, -1 will be promoted to 2^32-1. This is correct only if Ty has no more than 32 bits. I think getIntegerSCEV(-1, Type::Int64Ty) will return 2^32-1, while in getNotSCEV() we would like to have 2^64-1.

BTW, I've also seen ~0ULL in the code. IMHO, it's the cleanest.

I also wired up "xor %x, -1" to getNotSCEV (which is in turn -1 - %x). We'll
see what impact that has.

Nice catch.

Yes -- UMax is pretty much a direct copy of SMax.

The other option was making a single SCEVMax with a signedness flag. It doesn't
much matter, but I opted for this approach because this way we have commutative
SCEVs. If we combined them, umax and smax couldn't commute with one another.
Also, I think we could optimize umax better than smax, so it helps to have them
split apart.

Ok. I was just asking. Sometimes "less" code means also "less" readable code...;)

@nlewycky
Copy link
Contributor

Okay, but the body of getNotSCEV probably needs some fixing. Please note that
the Val argument of getIntegerSCEV() is an int. getIntegerSCEV() calls
ConstantInt::get() passing Val as the second argument which should be of
uint64_t type. This way, -1 will be promoted to 2^32-1. This is correct only
if Ty has no more than 32 bits. I think getIntegerSCEV(-1, Type::Int64Ty) will
return 2^32-1, while in getNotSCEV() we would like to have 2^64-1.

I think you're correct. What's odd is that getNegativeSCEV does this too. Perhaps it's also broken?

Since the replacement code is obviously correct, I'm going to make both functions use that.

@nlewycky
Copy link
Contributor

@llvmbot
Copy link
Member Author

llvmbot commented Feb 20, 2008

Okay, but the body of getNotSCEV probably needs some fixing. Please note that
the Val argument of getIntegerSCEV() is an int. getIntegerSCEV() calls
ConstantInt::get() passing Val as the second argument which should be of
uint64_t type. This way, -1 will be promoted to 2^32-1. This is correct only
if Ty has no more than 32 bits. I think getIntegerSCEV(-1, Type::Int64Ty) will
return 2^32-1, while in getNotSCEV() we would like to have 2^64-1.

I think you're correct. What's odd is that getNegativeSCEV does this too.
Perhaps it's also broken?

No, I'm not correct. It seems I've forgotten integer promotion rules. I've checked that (unsigned long long)-1 = 2^64-1. Sorry for misleading you. Feel free to revert these changes as getIntegerSCEV(-1, ...) should work.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla miscompilation
Projects
None yet
Development

No branches or pull requests

2 participants