-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Tell LLVM about impossible niche tags #139098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rustbot has assigned @compiler-errors. Use |
Some changes occurred in compiler/rustc_codegen_ssa |
c325ff6
to
98e9cb2
Compare
tests/codegen/enum/enum-match.rs
Outdated
// CHECK-LABEL: define{{.+}}i1 @match4_is_c(i8{{.+}}%e) | ||
// CHECK-NEXT: start | ||
// CHECK-NEXT: %[[REL_VAR:.+]] = add nsw i8 %e, -2 | ||
// CHECK-NEXT: %[[NOT_NICHE:.+]] = icmp ugt i8 %[[REL_VAR]], 4 | ||
// CHECK-NEXT: %[[NOT_IMPOSSIBLE:.+]] = icmp ne i8 %[[REL_VAR]], 2 | ||
// CHECK-NEXT: call void @llvm.assume(i1 %[[NOT_IMPOSSIBLE]]) | ||
// CHECK-NEXT: ret i1 %[[NOT_NICHE]] | ||
#[no_mangle] | ||
pub fn match4_is_c(e: MiddleNiche) -> bool { | ||
// Before #139098, this couldn't optimize out the `select` because it looked | ||
// like it was possible for a `2` to be produced on both sides. | ||
|
||
std::intrinsics::discriminant_value(&e) == 2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare on nightly, https://rust.godbolt.org/z/4bdKKjeG5
define noundef zeroext i1 @match4_is_c(i8 noundef range(i8 0, 7) %e) unnamed_addr {
start:
%0 = add nsw i8 %e, -2
%1 = icmp ugt i8 %0, 4
%_01 = icmp eq i8 %0, 2
%_0 = or i1 %1, %_01
ret i1 %_0
}
This comment has been minimized.
This comment has been minimized.
r? compiler |
r? compiler |
r? codegen |
☔ The latest upstream changes (presumably #139275) made this pull request unmergeable. Please resolve the merge conflicts. |
faaaeb4
to
51e67e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r? WaffleLapkin
I have a bunch of unimportant questions, but the PR looks good to me; r=me
// CHECK-NEXT: %2 = and i8 %0, 1 | ||
// CHECK-NEXT: %{{.+}} = select i1 %1, i8 13, i8 %2 | ||
// CHECK-NEXT: %[[IS_B:.+]] = icmp eq i8 %0, 2 | ||
// CHECK-NEXT: %[[TRUNC:.+]] = and i8 %0, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/unrelated to this PR: why do we need this truncation? it is used in not-B case, so %0
is necesserily 0..=1
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're 100% right, we don't need it.
I went to check what's up with it, and it looks like it's an opt-level thing, actually. If you use opt-level=2
it goes away: https://rust.godbolt.org/z/8Pfc99M77.
My guess would be that it's somehow related to llvm/llvm-project#132678 -- really, I wanted to change the x != 2
check to a x <= 1
check in this PR, as a better fit for looking for the natural untagged value rather than the arbitrary choice of 2
for Enum0::B
, but everything like that I tried ended up regressing something :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, it's CorrelatedValuePropagationPass
that figures this out (it's the thing that looks at ranges and comparisons and figures out which values are possible) and that doesn't run in opt-level=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. But where is it even coming from in the first place? Do i1
loads always truncate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's coming from this https://rust.godbolt.org/z/Knr5jdEY5 InstCombine:
- %b = trunc nuw i8 %0 to i1
- %3 = zext i1 %b to i8
+ %2 = and i8 %0, 1
Which would be correct if it weren't for the nuw
, since truncating down to one bit then zero-extending back to the original width is the same as & 1
.
Filed llvm/llvm-project#134908 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, the regressions are :(
but I'm more so asking why are we even emitting trunc
in the first place.
Original llir for the some branch looks like this:
bb3: ; preds = %start
%4 = load i8, ptr %e, align 1
%b = trunc nuw i8 %4 to i1
%5 = zext i1 %b to i8
store i8 %5, ptr %b.dbg.spill, align 1
%6 = icmp ule i1 %b, true
call void @llvm.assume(i1 %6)
%7 = zext i1 %b to i8
store i8 %7, ptr %_0, align 1
br label %bb4
Is this just "we use i1
as the type for 'locals' and i8
for memory"? But then the question is why can't we use I found the answer ig:i1
for memory too?
When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's right. We don't have typed memory, so we can't be sure that something written as i1
will only be read as i1
.
Might be an interesting experiment to see whether the whole i1 vs i8 difference is actually important (outside ABI requirements) now that we have range
attributes.
// You have to do something pretty obnoxious to get a variant index that doesn't | ||
// fit in the tag size, but it's possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the hell does E0732 exist D:
This should have really been just
enum HugeVariantIndex {
Possible257 = 257,
Bool258(bool),
Possible259,
}
(nothing to do in this PR, but I'm really weirded out by the fact that that's an error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Niche
encoding in rustc_abi
can't actually represent that -- the way it's specified it only accepts things with discriminant equal the variant index -- so even if it was legal we'd still have to do the other thing.
But I think the reason is that there's no stable way to actually read that discriminant, so it's of questionable value to actually bother having it legal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun!
#[no_mangle] | ||
pub fn option_nonzero_match(x: Option<std::num::NonZero<u16>>) -> u16 { | ||
// CHECK: %[[IS_NONE:.+]] = icmp eq i16 %x, 0 | ||
// CHECK: %[[OPT_DISCR:.+]] = select i1 %[[IS_NONE]], i64 0, i64 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to not emit a select if it's equivalent to zext? Or would that be pointless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's really worth it vs just letting LLVM do it for us.
(And this one would need to flip the icmp eq
to icmp ne
in order to use a zext
, which I think is actually always the case for a niche-encoded Option
. So maybe it'd be worth flipping the way we emit things so that we could then use the zext
, but it's not something I'd bother doing in this PR. Really the most silly part is that we need to widen to isize
only to trunc
down to bool
anyway, but those come from different parts of the MIR so it's hard to fix without adding new MIR things that probably aren't worth it, so I'm inclined to just stick with the "meh, LLVM will handle it" approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I originally thought about widen/trunc too, but figured that this is just multiple parts of our code assuming the type.
@bors r=WaffleLapkin |
…=WaffleLapkin Tell LLVM about impossible niche tags I was trying to find a better way of emitting discriminant calculations, but sadly had no luck. So here's a fairly small PR with the bits that did seem worth bothering: 1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.) 2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.
…errors Rollup of 19 pull requests Successful merges: - rust-lang#138676 (Implement overflow for infinite implied lifetime bounds) - rust-lang#139024 (Make error message for missing fields with `..` and without `..` more consistent) - rust-lang#139098 (Tell LLVM about impossible niche tags) - rust-lang#139124 (compiler: report error when trait object type param reference self) - rust-lang#139321 (Update to new rinja version (askama)) - rust-lang#139346 (Don't construct preds w escaping bound vars in `diagnostic_hir_wf_check`) - rust-lang#139386 (make it possible to use stage0 libtest on compiletest) - rust-lang#139421 (Fix trait upcasting to dyn type with no principal when there are projections) - rust-lang#139468 (Don't call `Span::with_parent` on the good path in `has_stashed_diagnostic`) - rust-lang#139476 (rm `RegionInferenceContext::var_infos`) - rust-lang#139481 (Add job summary links to post-merge report) - rust-lang#139485 (compiletest: Stricter parsing for diagnostic kinds) - rust-lang#139490 (Update some comment/docs related to "extern intrinsic" removal) - rust-lang#139491 (Update books) - rust-lang#139496 (Revert r-a changes of rust-lang#139455) - rust-lang#139500 (document panic behavior of Vec::resize and Vec::resize_with) - rust-lang#139501 (Fix stack overflow in exhaustiveness due to recursive HIR opaque hidden types) - rust-lang#139504 (add missing word in doc comment) - rust-lang#139507 (compiletest: Trim whitespace from environment variable names) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 10 pull requests Successful merges: - rust-lang#138676 (Implement overflow for infinite implied lifetime bounds) - rust-lang#139024 (Make error message for missing fields with `..` and without `..` more consistent) - rust-lang#139098 (Tell LLVM about impossible niche tags) - rust-lang#139124 (compiler: report error when trait object type param reference self) - rust-lang#139321 (Update to new rinja version (askama)) - rust-lang#139346 (Don't construct preds w escaping bound vars in `diagnostic_hir_wf_check`) - rust-lang#139386 (make it possible to use stage0 libtest on compiletest) - rust-lang#139421 (Fix trait upcasting to dyn type with no principal when there are projections) - rust-lang#139464 (Allow for reparsing failure when reparsing a pasted metavar.) - rust-lang#139490 (Update some comment/docs related to "extern intrinsic" removal) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#139098 - scottmcm:assert-impossible-tags, r=WaffleLapkin Tell LLVM about impossible niche tags I was trying to find a better way of emitting discriminant calculations, but sadly had no luck. So here's a fairly small PR with the bits that did seem worth bothering: 1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.) 2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.
|
||
pub enum Never {} | ||
|
||
pub enum HugeVariantIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, looks like something like this test should have been added after #110128 -- a previous version of the discriminant calculation got this wrong.
Allow matching on 3+ variant niche-encoded enums to optimize better While the two-variant case is most common (and already special-cased), it's pretty unusual to actually need the *fully-general* niche-decoding algorithm (that handles things like 200+ variants wrapping the encoding space and such). Layout puts the niche-encoded variants on one end of the natural values, so because enums don't have that many variants, it's quite common that there's no wrapping because the handful of variants just end up after the end of the `bool` or `char` or `newtype_index!` or whatever. This PR thus looks for those cases: situations where the tag's range doesn't actually wrap, and thus we can check for niche-vs-untag in one simple `icmp` without needing to adjust the tag value, and by picking between zero- and sign-extension based on *which* kind of non-wrapping it is, also help LLVM better understand by not forcing it to think about wrapping arithmetic either. It also emits the operations in a more optimization-friendly order. While the MIR Rvalue calculates a discriminant, so that's what we emit, code normally doesn't actually care about the actual discriminant for these niche-encoded enums. Rather, the discriminant is just getting passed to an equality check (for something like `matches!(foo, TerminatorKind::Goto { .. }`) or a `SwitchInt` (when it's being matched on). So while the old code would emit, roughly ```rust if is_niche { tag + ADJUSTMENT } else { UNTAGGED_DISCR } ``` this PR changes it instead to ```rust (if is_niche { tag } else { UNTAGGED_ADJ_DISCR }) + ADJUSTMENT ``` which on its own might seem odd, but it's actually easier to optimize because what we're actually doing is ```rust complicated_stuff() + ADJUSTMENT == 4 ``` or ```rust match complicated_stuff() + ADJUSTMENT { 0 =>…, 1 => …, 2 => …, _ => unreachable } ``` or in the generated `PartialEq` for enums with fieldless variants, ```rust complicated_stuff(a) + ADJUSTMENT == complicated_stuff(b) + ADJUSTMENT ``` and thus that's easy for the optimizer to eliminate the additions: ```rust complicated_stuff() == 2 ``` ```rust match complicated_stuff() { 7 => …, 8 => …, 9 => …, _ => unreachable } ``` ```rust complicated_stuff(a) == complicated_stuff(b) ``` For good measure I went and made sure that cranelift can do this optimization too 🙂 bytecodealliance/wasmtime#10489 r? WaffleLapkin Follow-up to rust-lang#139098 -- EDIT later: I happened to notice rust-lang#110197 (comment) -- it looks like there used to be some optimizations in this code, but they got removed for being wrong. I've added lots of tests here; let's hope I can avoid that fate 😬 (Certainly it would be possible to save some complexity by restricting this to the easy case, where it's unsigned-nowrap, the niches are after the natural payload, and all the variant indexes are small.)
I was trying to find a better way of emitting discriminant calculations, but sadly had no luck.
So here's a fairly small PR with the bits that did seem worth bothering:
As the
TagEncoding::Niche
docs describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds anllvm.assume
in non-debug mode to tell LLVM about that. (That way it can tell that the sides of theselect
have disjoint possible values.)I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is
i8
but yet there's still a variant index and discriminant of258
which doesn't fit in that tag type because the enum is really weird.