Skip to content

Missing fold (X == 0) | (trunc nuw X to i1) -> true #134093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dtcxzyw opened this issue Apr 2, 2025 · 11 comments
Open

Missing fold (X == 0) | (trunc nuw X to i1) -> true #134093

dtcxzyw opened this issue Apr 2, 2025 · 11 comments

Comments

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 2, 2025

Alive2: https://alive2.llvm.org/ce/z/RN4Sch

----------------------------------------
define i1 @src(i8 %x) {
#0:
  %cmp = icmp eq i8 %x, 0
  %trunc = trunc nuw i8 %x to i1
  %or = or i1 %cmp, %trunc
  ret i1 %or
}
=>
define i1 @tgt(i8 %x) {
#0:
  ret i1 1
}
Transformation seems to be correct!

It is a variant of #132678. cc @scottmcm
See also dtcxzyw/llvm-opt-benchmark#2234 (comment).

I don't know how to generalize this :(

@scottmcm
Copy link

scottmcm commented Apr 2, 2025

Is there any way to easily trace that back to the input rust? It reminds me of the pattern I'm fixing in rust-lang/rust#139098 (comment) but I don't know if it actually is -- it has the same shape, but in that issue LLVM doesn't actually have enough information to be able to fix it itself.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 2, 2025

Is there any way to easily trace that back to the input rust? It reminds me of the pattern I'm fixing in rust-lang/rust#139098 (comment) but I don't know if it actually is -- it has the same shape, but in that issue LLVM doesn't actually have enough information to be able to fix it itself.

Original pattern (gpui::executor::Task<T>::detach): https://github.com/zed-industries/zed/blob/0f58d4f533fa18a26ec3d0396cff1ad0362c94f5/crates/gpui/src/executor.rs#L75-L81

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 2, 2025

Is there any way to easily trace that back to the input rust?

llvm-opt-benchmark has a fuzzy pattern matcher to check if a pattern exists in real-world applications.

@scottmcm
Copy link

scottmcm commented Apr 4, 2025

Looking at this again, it makes me think of the normalization question I mentioned back in https://discourse.llvm.org/t/rfc-add-nowrap-flags-to-trunc/77453/3?u=scottmcm.

trunc nuw i8 %x to i1 is the more-poisonous version of icmp ne i8 %x, 0, so it might be that there are just a bunch of folds that need to handle the trunc nuw phrasing now that its not just translated (using range information) to the icmp.

Attempting to think of a way to generalize it: the return is poison if %x isn't just zero or one, so maybe you could push that into the signature as i8 range(0, 2) %x? Dunno if there's a good existing pattern for noticing that, though, since it depends on the or -- if there was a select or something that could swallow the poison it wouldn't work.

Hmm, maybe there's a way that known bits could use the «only the low bit can be set» from the trunc nuw when looking at the icmp in the context of that or?

(Oh, for an e-graph to be able to avoid these normalization questions...)

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 4, 2025

the return is poison if %x isn't just zero or one, so maybe you could push that into the signature as i8 range(0, 2) %x?

You remind me that %x is a load. Unfortunately, the !range information on the load is dropped during optimization.

; Function Attrs: nonlazybind uwtable
define hidden void @"_ZN4gpui8executor13Task$LT$T$GT$6detach17h2e5fa6869c03c0deE.llvm.1438012894402087875"(ptr noalias noundef align 8 captures(none) dereferenceable(16) %0) unnamed_addr #4 personality ptr @rust_eh_personality {
  %2 = alloca [24 x i8], align 8
  %3 = load i8, ptr %0, align 8, !range !339, !noundef !7
  %trunc = trunc nuw i8 %3 to i1
  br i1 %trunc, label %6, label %"_ZN4core3ptr85drop_in_place$LT$gpui..executor..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17h93418fc4d0f14431E.llvm.1438012894402087875.exit"

4:                                                ; preds = %6
  %.pr = load i8, ptr %0, align 8  ; <<<<<<missing !range info>>>>>>
  %trunc1 = trunc nuw i8 %.pr to i1
  %5 = icmp eq i8 %.pr, 0
  %or.cond = or i1 %5, %trunc1
  br i1 %or.cond, label %"_ZN4core3ptr85drop_in_place$LT$gpui..executor..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17h93418fc4d0f14431E.llvm.1438012894402087875.exit", label %13

6:                                                ; preds = %1
  %7 = getelementptr inbounds nuw i8, ptr %0, i64 8
  %8 = load ptr, ptr %7, align 8, !nonnull !7, !noundef !7
  invoke void @"_ZN10async_task4task17Task$LT$T$C$M$GT$6detach17h8be9b165847ceef7E"(ptr noundef nonnull %8)
          to label %4 unwind label %9

9:                                                ; preds = %6
  %10 = landingpad { ptr, i32 }
          cleanup
  %11 = load i8, ptr %0, align 8, !range !339, !noundef !7
  %12 = trunc nuw i8 %11 to i1
  br i1 %12, label %22, label %23

"_ZN4core3ptr85drop_in_place$LT$gpui..executor..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17h93418fc4d0f14431E.llvm.1438012894402087875.exit": ; preds = %1, %"_ZN4core3ptr87drop_in_place$LT$async_task..task..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17ha3b49059ebe7cf4bE.exit.i", %4
  ret void

13:                                               ; preds = %4
  %14 = getelementptr inbounds nuw i8, ptr %0, i64 8
  tail call void @"_ZN10async_task4task17Task$LT$T$C$M$GT$12set_canceled17h61f900e0a8516280E"(ptr noalias noundef nonnull align 8 dereferenceable(8) %14)
  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %2), !noalias !493
  call void @"_ZN10async_task4task17Task$LT$T$C$M$GT$12set_detached17h55bd55a546bb74e4E"(ptr noalias noundef nonnull sret([24 x i8]) align 8 captures(none) dereferenceable(24) %2, ptr noalias noundef nonnull align 8 dereferenceable(8) %14)
  %15 = load i64, ptr %2, align 8, !range !18, !alias.scope !500, !noalias !493, !noundef !7
  %16 = icmp eq i64 %15, 0
  br i1 %16, label %"_ZN4core3ptr87drop_in_place$LT$async_task..task..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17ha3b49059ebe7cf4bE.exit.i", label %17

17:                                               ; preds = %13
  %18 = getelementptr inbounds nuw i8, ptr %2, i64 8
  %19 = load ptr, ptr %18, align 8, !alias.scope !503, !noalias !493, !noundef !7
  %20 = icmp eq ptr %19, null
  br i1 %20, label %"_ZN4core3ptr87drop_in_place$LT$async_task..task..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17ha3b49059ebe7cf4bE.exit.i", label %21

21:                                               ; preds = %17
  call void @"_ZN4core3ptr91drop_in_place$LT$alloc..boxed..Box$LT$dyn$u20$core..any..Any$u2b$core..marker..Send$GT$$GT$17h3e45a709bf9f3b0fE.llvm.15385039395483765890"(ptr noalias noundef nonnull align 8 dereferenceable(16) %18)
  br label %"_ZN4core3ptr87drop_in_place$LT$async_task..task..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17ha3b49059ebe7cf4bE.exit.i"

"_ZN4core3ptr87drop_in_place$LT$async_task..task..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17ha3b49059ebe7cf4bE.exit.i": ; preds = %21, %17, %13
  call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %2), !noalias !493
  br label %"_ZN4core3ptr85drop_in_place$LT$gpui..executor..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17h93418fc4d0f14431E.llvm.1438012894402087875.exit"

22:                                               ; preds = %23, %9
  resume { ptr, i32 } %10

23:                                               ; preds = %9
  invoke void @"_ZN4core3ptr85drop_in_place$LT$gpui..executor..Task$LT$core..option..Option$LT$$LP$$RP$$GT$$GT$$GT$17h93418fc4d0f14431E.llvm.1438012894402087875"(ptr noalias noundef nonnull align 8 dereferenceable(16) %0) #34
          to label %22 unwind label %24

24:                                               ; preds = %23
  %25 = landingpad { ptr, i32 }
          filter [0 x ptr] zeroinitializer
  tail call void @_ZN4core9panicking16panic_in_cleanup17hfa05ef7d5107e16aE() #35
  unreachable
}

@scottmcm
Copy link

scottmcm commented Apr 4, 2025

Unfortunately, the !range information on the load is dropped during optimization.

Just checking: You confirmed that it's lost, but was there originally? Because if rustc isn't emitting it in this case, for some reason, I could take a look at fixing that in rustc so that LLVM can just merge all these loads.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 4, 2025

Unfortunately, the !range information on the load is dropped during optimization.

Just checking: You confirmed that it's lost, but was there originally? Because if rustc isn't emitting it in this case, for some reason, I could take a look at fixing that in rustc so that LLVM can just merge all these loads.

Confirmed. It is a JumpThreading issue.
See #134403

@andjo403
Copy link
Contributor

andjo403 commented Apr 5, 2025

feels like adding support for trunc nuw x to i1 in all folds will be a huge work.
eg. this can be solved by updating this

if (!match(Op0, m_ICmp(Pred, m_Value(A), m_Value(B))) ||
!ICmpInst::isEquality(Pred))
return nullptr;
to

  if (Op0->getType()->isIntOrIntVectorTy(1) &&
      (match(Op0, m_NUWTrunc(m_Value(A))))) {
    Pred = ICmpInst::ICMP_NE;
    B = ConstantInt::get(A->getType(), 0);
  } else if (!match(Op0, m_ICmp(Pred, m_Value(A), m_Value(B))) ||
             !ICmpInst::isEquality(Pred))
    return nullptr;

do we want to make the fold trunc nuw x to i1 -> icmp ne x, 0 instead?
the down side is that we lose some range information as we do not know that the range for x is 0,2 in the icmp instruction.

seems like it solves the problem in the example https://alive2.llvm.org/ce/z/c4KE9B

@scottmcm
Copy link

scottmcm commented Apr 5, 2025

the down side is that we lose some range information

That loss of information would, as I understand it, have a major negative impact on Rust.

There's a bunch of places where we end up passing around bools as i8, and depend on the trunc nuw back to i1 being free. The simplest example is in unions, but also comes up in incredibly-pervasive things like the Option<_> and Result<_, _> enums, since something like Option<u32> stores two i32s in the LLVM, using trunc nuw when consuming the first one that represents whether it's Some(_) or None.

It's only relatively recently that improvements in LLVM finally fixed rust-lang/rust#101210 and I really wouldn't want want's currently free -- thanks to the trunc nuw+zext correctly optimizing away -- to get an icmp again, like it used to, as would likely happen if LLVM went back to normalizing the trunc to an icmp.

Similarly, as of rust-lang/rust#137500 rustc uses trunc nuw for branches on 0/1, and we'd have to stop doing that if instcombine replaced it with icmp ne, because the other values being UB is incredibly important.

@andjo403
Copy link
Contributor

andjo403 commented Apr 6, 2025

Only from my experience for adding handling of trunc x to i1 the same way as icmp ne (and x, 1), 0 it is a lot of work and to me it looks like we want to handle trunc nuw x to i1 as icmp ne x, 0 in most cases so will result in almost any fold that contains a icmp also needs to handle the trunc nuw x to i1 instruction.
So do we need a more generic solution?
Maybe a new flagg on icmp that the operands is in range(0,2) so it is possible to have a fold like trunc nuw x to i1 -> icmp bool ne x, 0 then only folds that care about the range of x can do special handling.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 6, 2025

Only from my experience for adding handling of trunc x to i1 the same way as icmp ne (and x, 1), 0 it is a lot of work and to me it looks like we want to handle trunc nuw x to i1 as icmp ne x, 0 in most cases so will result in almost any fold that contains a icmp also needs to handle the trunc nuw x to i1 instruction.

As we discussed before, the effort for migrating icmp eq/ne (x & 1), 0 into trunc x to i1 is limited (at least for common real-world cases).
Converting icmp/trunc into decomposed icmps (Pred + LHS + RHS) or BitTests (Pred + LHS + LHSMask + RHSC) is always helpful. It may be reused in the future if we want to optimize some patterns involving min/max.

So do we need a more generic solution?
Maybe a new flagg on icmp that the operands is in range(0,2) so it is possible to have a fold like trunc nuw x to i1 -> icmp bool ne x, 0 then only folds that care about the range of x can do special handling.

TBH I hope we can encode at-use range information in the LLVM IR.
Adding a new poison-generating flag requires more work on fixing flag-propagation bugs :(

Here is my solution for this case:

  1. Preserve !range metadata in JumpThreading
  2. Convert icmp eq i8 %x (with known bits = 0...01), 0 into xor (trunc nuw i8 %x to i1), true (It may cause many regressions).
  3. InstSimplify will fold !(trunc nuw i8 %x to i1) | (trunc nuw i8 %x to i1) into true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants