-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Incorrect iteration count for loop with unsigned ICMP condition #2375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If %n = 0, then we get %n - 100 = -100 = +156 (unsigned). And this is the correct loop trip count; if you start at 100, it'll take you 156 (256-100) iterations to get back around to zero. Reopen if you find a counter-example. |
Please, note that the comparison operator is ULT and not NE. What you are saying is true in the second case. Suppose %n = 0. Then the first execution of: |
Right again. :) Confirmed, it's a bug (experimentally this time). I don't see any way to fix it without introducing a UMAX, which I was really hoping to avoid... |
create umax, just like smax There should be some additional optimizations that we can do with umax that we couldn't with smax, but I haven't implemented any. |
The attached patch miscompiles McCat/18-imp. Investigating... |
proposed fix While fixing it, I noticed that we were being suboptimal and not detecting patterns that could've been converted into smax. So I fixed that and added the equivalent umax case. Passes nightly test. |
Nick, Thanks a lot for tackling this. I have few comments about the patch. Index: include/llvm/Analysis/ScalarEvolutionExpressions.h
Cosmetic proposition: scUMaxExpr could be after scSMaxExpr to reflect the order we handle these expressions everywhere in the code. Index: lib/Analysis/ScalarEvolutionExpander.cpp+Value *SCEVExpander::visitUMaxExpr(SCEVUMaxExpr *S) {
The code for visitUMaxExpr is almost the same as for visitSMaxExpr. Have you considered factoring the common part out? However, the current approach is okay for me as there functions are very short. Index: lib/Analysis/ScalarEvolution.cpp+SCEVHandle ScalarEvolution::getUMaxExpr(std::vector Ops) {
I don't think the else branch here is really needed. ConstantInt::get() returns a ConstantInt object so there is no possibility it will be ever executed. This probably also applies to other get*Expr() methods, but we may remove all these branches after this bug is fixed.
isMinValue(false) should be used there. Also, the -inf should be substituted with 0 in the comment.
add -> umax @@ -1628,6 +1725,8 @@
Is there any reason for the break to exist after this case and not after the others? @@ -1699,6 +1799,28 @@
Note that first arg of getIntegerSCEV is int, not long long. I think the safe way would be to use ConstantInt::getAllOnesValue() and then ScalarEvolution::getConstant(). @@ -2541,7 +2665,8 @@
The FIXME can be removed now. And the last question. Have you considered using a common superclass for SMax and UMax? I'm not sure if it changed much - just asking. Wojtek |
patch with cleanup
The order in this enum isn't cosmetic, it's the order that GroupByComplexity will use when grouping them. Basically it means that we'd rather have a umax than an smax, given the choice. That said, it wouldn't lead to any different codegen today because there's no optimizations that peek through umax or smax to, for example, pull an add instruction through the umax. But it'd be easier to do that for a umax, so I put it first.
I don't think it's worth it to refactor them at this point. It'd basically be a mess of if statements. I am guilty as charged of cut'n'pasting smax code though. :)
You are so totally correct. I've gone ahead and removed the equivalent branches from all get*Expr methods.
Good catch. Done.
Done. Same thing for the smax function, too.
Nope, deleted.
I've refactored this out into a getNotSCEV method much like getNegativeSCEV. Using -1ULL is quite amusing (uh, negative unsigned?) but plain -1 should be fine and is already used elsewhere in the code. I also wired up "xor %x, -1" to getNotSCEV (which is in turn -1 - %x). We'll see what impact that has. @@ -2541,7 +2665,8 @@
Done.
Yes -- UMax is pretty much a direct copy of SMax. The other option was making a single SCEVMax with a signedness flag. It doesn't much matter, but I opted for this approach because this way we have commutative SCEVs. If we combined them, umax and smax couldn't commute with one another. Also, I think we could optimize umax better than smax, so it helps to have them split apart. |
Uh ... I posted a slightly old version of the patch. This hunk:
Should be "CI->isAllOnesValue()". Sorry. |
Nice corrections! Just one issue. Please, read below.
Okay.
Seems fine to me.
Okay, but the body of getNotSCEV probably needs some fixing. Please note that the Val argument of getIntegerSCEV() is an int. getIntegerSCEV() calls ConstantInt::get() passing Val as the second argument which should be of uint64_t type. This way, -1 will be promoted to 2^32-1. This is correct only if Ty has no more than 32 bits. I think getIntegerSCEV(-1, Type::Int64Ty) will return 2^32-1, while in getNotSCEV() we would like to have 2^64-1. BTW, I've also seen ~0ULL in the code. IMHO, it's the cleanest.
Nice catch.
Ok. I was just asking. Sometimes "less" code means also "less" readable code...;) |
I think you're correct. What's odd is that getNegativeSCEV does this too. Perhaps it's also broken? Since the replacement code is obviously correct, I'm going to make both functions use that. |
No, I'm not correct. It seems I've forgotten integer promotion rules. I've checked that (unsigned long long)-1 = 2^64-1. Sorry for misleading you. Feel free to revert these changes as getIntegerSCEV(-1, ...) should work. |
Extended Description
For the following LLVM code:
define void @foo(i32 %n) {
entry:
br label %header
header:
%i = phi i32 [ 100, %entry ], [ %i.inc, %next ]
%cond = icmp ult i32 %i, %n
br i1 %cond, label %next, label %return
next:
%i.inc = add i32 %i, 1
br label %header
return:
ret void
}
which contains loop of this form:
unsigned n = ...;
for (unsigned i = 100; i < n; ++i)
;
scalar evolution determines loop iteration count as: (-100 + %n).
This isn't correct, because for %n < 100 we'll get negative number of iterations.
One way to fix it is to add a 'umax' SCEV similar to the 'smax' one. Using it, the answer for the example would be: (100 umax %n) - 100.
Any other ideas?
The text was updated successfully, but these errors were encountered: