Tell LLVM about impossible niche tags #139098

scottmcm · 2025-03-29T06:55:29Z

I was trying to find a better way of emitting discriminant calculations, but sadly had no luck.

So here's a fairly small PR with the bits that did seem worth bothering:

As the TagEncoding::Niche docs describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an llvm.assume in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the select have disjoint possible values.)
I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is i8 but yet there's still a variant index and discriminant of 258 which doesn't fit in that tag type because the enum is really weird.

rustbot · 2025-03-29T06:55:34Z

rustbot has assigned @compiler-errors.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-03-29T06:55:35Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

scottmcm · 2025-03-29T07:14:44Z

tests/codegen/enum/enum-match.rs

+// CHECK-LABEL: define{{.+}}i1 @match4_is_c(i8{{.+}}%e)
+// CHECK-NEXT: start
+// CHECK-NEXT: %[[REL_VAR:.+]] = add nsw i8 %e, -2
+// CHECK-NEXT: %[[NOT_NICHE:.+]] = icmp ugt i8 %[[REL_VAR]], 4
+// CHECK-NEXT: %[[NOT_IMPOSSIBLE:.+]] = icmp ne i8 %[[REL_VAR]], 2
+// CHECK-NEXT: call void @llvm.assume(i1 %[[NOT_IMPOSSIBLE]])
+// CHECK-NEXT: ret i1 %[[NOT_NICHE]]
+#[no_mangle]
+pub fn match4_is_c(e: MiddleNiche) -> bool {
+    // Before #139098, this couldn't optimize out the `select` because it looked
+    // like it was possible for a `2` to be produced on both sides.
+
+    std::intrinsics::discriminant_value(&e) == 2
+}


Compare on nightly, https://rust.godbolt.org/z/4bdKKjeG5

define noundef zeroext i1 @match4_is_c(i8 noundef range(i8 0, 7) %e) unnamed_addr { start: %0 = add nsw i8 %e, -2 %1 = icmp ugt i8 %0, 4 %_01 = icmp eq i8 %0, 2 %_0 = or i1 %1, %_01 ret i1 %_0 }

compiler-errors · 2025-04-01T21:59:34Z

r? compiler

fee1-dead · 2025-04-04T11:35:53Z

r? compiler

Nadrieril · 2025-04-04T19:32:49Z

r? codegen

bors · 2025-04-06T02:11:20Z

☔ The latest upstream changes (presumably #139275) made this pull request unmergeable. Please resolve the merge conflicts.

WaffleLapkin

r? WaffleLapkin
I have a bunch of unimportant questions, but the PR looks good to me; r=me

WaffleLapkin · 2025-04-07T21:50:21Z

tests/codegen/enum/enum-match.rs

-// CHECK-NEXT: %2 = and i8 %0, 1
-// CHECK-NEXT: %{{.+}} = select i1 %1, i8 13, i8 %2
+// CHECK-NEXT: %[[IS_B:.+]] = icmp eq i8 %0, 2
+// CHECK-NEXT: %[[TRUNC:.+]] = and i8 %0, 1


question/unrelated to this PR: why do we need this truncation? it is used in not-B case, so %0 is necesserily 0..=1...

You're 100% right, we don't need it.

I went to check what's up with it, and it looks like it's an opt-level thing, actually. If you use opt-level=2 it goes away: https://rust.godbolt.org/z/8Pfc99M77.

My guess would be that it's somehow related to llvm/llvm-project#132678 -- really, I wanted to change the x != 2 check to a x <= 1 check in this PR, as a better fit for looking for the natural untagged value rather than the arbitrary choice of 2 for Enum0::B, but everything like that I tried ended up regressing something :(

Specifically, it's CorrelatedValuePropagationPass that figures this out (it's the thing that looks at ranges and comparisons and figures out which values are possible) and that doesn't run in opt-level=1.

Interesting. But where is it even coming from in the first place? Do i1 loads always truncate?

It's coming from this https://rust.godbolt.org/z/Knr5jdEY5 InstCombine:

- %b = trunc nuw i8 %0 to i1 - %3 = zext i1 %b to i8 + %2 = and i8 %0, 1

Which would be correct if it weren't for the nuw, since truncating down to one bit then zero-extending back to the original width is the same as & 1.

Filed llvm/llvm-project#134908 🙂

oof, the regressions are :(

but I'm more so asking why are we even emitting trunc in the first place.

Original llir for the some branch looks like this:

bb3: ; preds = %start %4 = load i8, ptr %e, align 1 %b = trunc nuw i8 %4 to i1 %5 = zext i1 %b to i8 store i8 %5, ptr %b.dbg.spill, align 1 %6 = icmp ule i1 %b, true call void @llvm.assume(i1 %6) %7 = zext i1 %b to i8 store i8 %7, ptr %_0, align 1 br label %bb4

Is this just "we use i1 as the type for 'locals' and i8 for memory"? ~~But then the question is why can't we use i1 for memory too?~~ I found the answer ig:

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

Yup, that's right. We don't have typed memory, so we can't be sure that something written as i1 will only be read as i1.

Might be an interesting experiment to see whether the whole i1 vs i8 difference is actually important (outside ABI requirements) now that we have range attributes.

tests/codegen/enum/enum-match.rs

WaffleLapkin · 2025-04-07T22:18:31Z

tests/codegen/enum/enum-match.rs

+// You have to do something pretty obnoxious to get a variant index that doesn't
+// fit in the tag size, but it's possible


Why the hell does E0732 exist D:

This should have really been just

enum HugeVariantIndex { Possible257 = 257, Bool258(bool), Possible259, }

(nothing to do in this PR, but I'm really weirded out by the fact that that's an error)

The current Niche encoding in rustc_abi can't actually represent that -- the way it's specified it only accepts things with discriminant equal the variant index -- so even if it was legal we'd still have to do the other thing.

But I think the reason is that there's no stable way to actually read that discriminant, so it's of questionable value to actually bother having it legal.

compiler/rustc_codegen_ssa/src/mir/operand.rs

WaffleLapkin · 2025-04-07T22:33:26Z

tests/codegen/enum/enum-two-variants-match.rs

+#[no_mangle]
+pub fn option_nonzero_match(x: Option<std::num::NonZero<u16>>) -> u16 {
+    // CHECK: %[[IS_NONE:.+]] = icmp eq i16 %x, 0
+    // CHECK: %[[OPT_DISCR:.+]] = select i1 %[[IS_NONE]], i64 0, i64 1


Would it make sense to not emit a select if it's equivalent to zext? Or would that be pointless?

I don't know if it's really worth it vs just letting LLVM do it for us.

(And this one would need to flip the icmp eq to icmp ne in order to use a zext, which I think is actually always the case for a niche-encoded Option. So maybe it'd be worth flipping the way we emit things so that we could then use the zext, but it's not something I'd bother doing in this PR. Really the most silly part is that we need to widen to isize only to trunc down to bool anyway, but those come from different parts of the MIR so it's hard to fix without adding new MIR things that probably aren't worth it, so I'm inclined to just stick with the "meh, LLVM will handle it" approach.)

Yeah, I originally thought about widen/trunc too, but figured that this is just multiple parts of our code assuming the type.

tests/codegen/enum/enum-two-variants-match.rs

scottmcm · 2025-04-08T01:54:12Z

@bors r=WaffleLapkin

bors · 2025-04-08T01:54:14Z

📌 Commit 502f7f9 has been approved by WaffleLapkin

It is now in the queue for this repository.

…=WaffleLapkin Tell LLVM about impossible niche tags I was trying to find a better way of emitting discriminant calculations, but sadly had no luck. So here's a fairly small PR with the bits that did seem worth bothering: 1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.) 2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.

…errors Rollup of 19 pull requests Successful merges: - rust-lang#138676 (Implement overflow for infinite implied lifetime bounds) - rust-lang#139024 (Make error message for missing fields with `..` and without `..` more consistent) - rust-lang#139098 (Tell LLVM about impossible niche tags) - rust-lang#139124 (compiler: report error when trait object type param reference self) - rust-lang#139321 (Update to new rinja version (askama)) - rust-lang#139346 (Don't construct preds w escaping bound vars in `diagnostic_hir_wf_check`) - rust-lang#139386 (make it possible to use stage0 libtest on compiletest) - rust-lang#139421 (Fix trait upcasting to dyn type with no principal when there are projections) - rust-lang#139468 (Don't call `Span::with_parent` on the good path in `has_stashed_diagnostic`) - rust-lang#139476 (rm `RegionInferenceContext::var_infos`) - rust-lang#139481 (Add job summary links to post-merge report) - rust-lang#139485 (compiletest: Stricter parsing for diagnostic kinds) - rust-lang#139490 (Update some comment/docs related to "extern intrinsic" removal) - rust-lang#139491 (Update books) - rust-lang#139496 (Revert r-a changes of rust-lang#139455) - rust-lang#139500 (document panic behavior of Vec::resize and Vec::resize_with) - rust-lang#139501 (Fix stack overflow in exhaustiveness due to recursive HIR opaque hidden types) - rust-lang#139504 (add missing word in doc comment) - rust-lang#139507 (compiletest: Trim whitespace from environment variable names) r? `@ghost` `@rustbot` modify labels: rollup

Rollup of 10 pull requests Successful merges: - rust-lang#138676 (Implement overflow for infinite implied lifetime bounds) - rust-lang#139024 (Make error message for missing fields with `..` and without `..` more consistent) - rust-lang#139098 (Tell LLVM about impossible niche tags) - rust-lang#139124 (compiler: report error when trait object type param reference self) - rust-lang#139321 (Update to new rinja version (askama)) - rust-lang#139346 (Don't construct preds w escaping bound vars in `diagnostic_hir_wf_check`) - rust-lang#139386 (make it possible to use stage0 libtest on compiletest) - rust-lang#139421 (Fix trait upcasting to dyn type with no principal when there are projections) - rust-lang#139464 (Allow for reparsing failure when reparsing a pasted metavar.) - rust-lang#139490 (Update some comment/docs related to "extern intrinsic" removal) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#139098 - scottmcm:assert-impossible-tags, r=WaffleLapkin Tell LLVM about impossible niche tags I was trying to find a better way of emitting discriminant calculations, but sadly had no luck. So here's a fairly small PR with the bits that did seem worth bothering: 1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.) 2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.

scottmcm · 2025-04-13T05:27:02Z

tests/codegen/enum/enum-match.rs

+
+pub enum Never {}
+
+pub enum HugeVariantIndex {


Oh, looks like something like this test should have been added after #110128 -- a previous version of the discriminant calculation got this wrong.

Allow matching on 3+ variant niche-encoded enums to optimize better While the two-variant case is most common (and already special-cased), it's pretty unusual to actually need the *fully-general* niche-decoding algorithm (that handles things like 200+ variants wrapping the encoding space and such). Layout puts the niche-encoded variants on one end of the natural values, so because enums don't have that many variants, it's quite common that there's no wrapping because the handful of variants just end up after the end of the `bool` or `char` or `newtype_index!` or whatever. This PR thus looks for those cases: situations where the tag's range doesn't actually wrap, and thus we can check for niche-vs-untag in one simple `icmp` without needing to adjust the tag value, and by picking between zero- and sign-extension based on *which* kind of non-wrapping it is, also help LLVM better understand by not forcing it to think about wrapping arithmetic either. It also emits the operations in a more optimization-friendly order. While the MIR Rvalue calculates a discriminant, so that's what we emit, code normally doesn't actually care about the actual discriminant for these niche-encoded enums. Rather, the discriminant is just getting passed to an equality check (for something like `matches!(foo, TerminatorKind::Goto { .. }`) or a `SwitchInt` (when it's being matched on). So while the old code would emit, roughly ```rust if is_niche { tag + ADJUSTMENT } else { UNTAGGED_DISCR } ``` this PR changes it instead to ```rust (if is_niche { tag } else { UNTAGGED_ADJ_DISCR }) + ADJUSTMENT ``` which on its own might seem odd, but it's actually easier to optimize because what we're actually doing is ```rust complicated_stuff() + ADJUSTMENT == 4 ``` or ```rust match complicated_stuff() + ADJUSTMENT { 0 =>…, 1 => …, 2 => …, _ => unreachable } ``` or in the generated `PartialEq` for enums with fieldless variants, ```rust complicated_stuff(a) + ADJUSTMENT == complicated_stuff(b) + ADJUSTMENT ``` and thus that's easy for the optimizer to eliminate the additions: ```rust complicated_stuff() == 2 ``` ```rust match complicated_stuff() { 7 => …, 8 => …, 9 => …, _ => unreachable } ``` ```rust complicated_stuff(a) == complicated_stuff(b) ``` For good measure I went and made sure that cranelift can do this optimization too 🙂 bytecodealliance/wasmtime#10489 r? WaffleLapkin Follow-up to rust-lang#139098 -- EDIT later: I happened to notice rust-lang#110197 (comment) -- it looks like there used to be some optimizations in this code, but they got removed for being wrong. I've added lots of tests here; let's hope I can avoid that fate 😬 (Certainly it would be possible to save some complexity by restricting this to the easy case, where it's unsigned-nowrap, the niches are after the natural payload, and all the variant indexes are small.)

rustbot assigned compiler-errors Mar 29, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 29, 2025

scottmcm force-pushed the assert-impossible-tags branch from c325ff6 to 98e9cb2 Compare March 29, 2025 07:12

scottmcm commented Mar 29, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot assigned fee1-dead and unassigned compiler-errors Apr 1, 2025

scottmcm mentioned this pull request Apr 2, 2025

Missing fold (X == 0) | (trunc nuw X to i1) -> true llvm/llvm-project#134093

Open

rustbot assigned Nadrieril and unassigned fee1-dead Apr 4, 2025

rustbot assigned workingjubilee and unassigned Nadrieril Apr 4, 2025

scottmcm added 2 commits April 5, 2025 19:54

Tell LLVM about impossible niche tags

1f06a6a

LLVM18 compatibility fixes in the tests

51e67e2

scottmcm force-pushed the assert-impossible-tags branch from faaaeb4 to 51e67e2 Compare April 6, 2025 03:01

WaffleLapkin approved these changes Apr 7, 2025

View reviewed changes

rustbot assigned WaffleLapkin and unassigned workingjubilee Apr 7, 2025

Address PR feedback

502f7f9

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 8, 2025

Zalathar mentioned this pull request Apr 8, 2025

Rollup of 19 pull requests #139508

Closed

Zalathar mentioned this pull request Apr 8, 2025

Rollup of 10 pull requests #139525

Merged

bors merged commit 7ffa56c into rust-lang:master Apr 8, 2025
6 checks passed

rustbot added this to the 1.88.0 milestone Apr 8, 2025

scottmcm mentioned this pull request Apr 8, 2025

InstCombine introduces unnecessary and 1 when folding trunc nuw+zext llvm/llvm-project#134908

Open

scottmcm deleted the assert-impossible-tags branch April 8, 2025 19:02

scottmcm mentioned this pull request Apr 13, 2025

Allow matching on 3+ variant niche-encoded enums to optimize better #139729

Open

scottmcm commented Apr 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tell LLVM about impossible niche tags #139098

Tell LLVM about impossible niche tags #139098

scottmcm commented Mar 29, 2025 •

edited

Loading

rustbot commented Mar 29, 2025

rustbot commented Mar 29, 2025

scottmcm Mar 29, 2025

This comment has been minimized.

compiler-errors commented Apr 1, 2025

fee1-dead commented Apr 4, 2025

Nadrieril commented Apr 4, 2025

bors commented Apr 6, 2025

WaffleLapkin left a comment

WaffleLapkin Apr 7, 2025

scottmcm Apr 7, 2025

scottmcm Apr 7, 2025

WaffleLapkin Apr 8, 2025

scottmcm Apr 8, 2025

WaffleLapkin Apr 9, 2025

scottmcm Apr 9, 2025

WaffleLapkin Apr 7, 2025

scottmcm Apr 7, 2025

WaffleLapkin Apr 8, 2025

WaffleLapkin Apr 7, 2025

scottmcm Apr 7, 2025

WaffleLapkin Apr 8, 2025

scottmcm commented Apr 8, 2025

bors commented Apr 8, 2025

scottmcm Apr 13, 2025

		// You have to do something pretty obnoxious to get a variant index that doesn't
		// fit in the tag size, but it's possible

Tell LLVM about impossible niche tags #139098

Tell LLVM about impossible niche tags #139098

Conversation

scottmcm commented Mar 29, 2025 • edited Loading

rustbot commented Mar 29, 2025

rustbot commented Mar 29, 2025

Choose a reason for hiding this comment

This comment has been minimized.

compiler-errors commented Apr 1, 2025

fee1-dead commented Apr 4, 2025

Nadrieril commented Apr 4, 2025

bors commented Apr 6, 2025

WaffleLapkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm commented Apr 8, 2025

bors commented Apr 8, 2025

Choose a reason for hiding this comment

scottmcm commented Mar 29, 2025 •

edited

Loading