Skip to content

[DAGCombiner] Eliminate fp casts if we have the right fast math flags #131345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2025

Conversation

john-brawn-arm
Copy link
Collaborator

When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.

When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate
these casts.
@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-arm

Author: John Brawn (john-brawn-arm)

Changes

When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.


Patch is 63.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131345.diff

15 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+45)
  • (modified) llvm/test/CodeGen/AArch64/f16-instructions.ll (+11-5)
  • (modified) llvm/test/CodeGen/AArch64/fmla.ll (+2-5)
  • (modified) llvm/test/CodeGen/AArch64/fp16_fast_math.ll (+109)
  • (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll (+61-111)
  • (modified) llvm/test/CodeGen/AArch64/vecreduce-fadd.ll (+78-138)
  • (modified) llvm/test/CodeGen/AArch64/vecreduce-fmul.ll (+54-94)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+4-24)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+4-24)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (-20)
  • (modified) llvm/test/CodeGen/ARM/fp16_fast_math.ll (+143-6)
  • (modified) llvm/test/CodeGen/Thumb2/bf16-instructions.ll (+8-17)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c35838601cc9c..c16e2f0bc865b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18455,7 +18455,45 @@ SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
   return SDValue();
 }
 
+// Eliminate a floating-point widening of a narrowed value if the fast math
+// flags allow it.
+static SDValue eliminateFPCastPair(SDNode *N) {
+  SDValue N0 = N->getOperand(0);
+  EVT VT = N->getValueType(0);
+
+  unsigned NarrowingOp;
+  switch (N->getOpcode()) {
+  case ISD::FP16_TO_FP:
+    NarrowingOp = ISD::FP_TO_FP16;
+    break;
+  case ISD::BF16_TO_FP:
+    NarrowingOp = ISD::FP_TO_BF16;
+    break;
+  case ISD::FP_EXTEND:
+    NarrowingOp = ISD::FP_ROUND;
+    break;
+  default:
+    llvm_unreachable("Expected widening FP cast");
+  }
+
+  if (N0.getOpcode() == NarrowingOp && N0.getOperand(0).getValueType() == VT) {
+    const SDNodeFlags SrcFlags = N0->getFlags();
+    const SDNodeFlags DstFlags = N->getFlags();
+    // Narrowing can introduce inf and change the encoding of a nan, so the
+    // destination must have the nnan and ninf flags to indicate that we don't
+    // need to care about that. We are also removing a rounding step, and that
+    // requires both the source and destination to allow contraction.
+    if (DstFlags.hasNoNaNs() && DstFlags.hasNoInfs() &&
+        SrcFlags.hasAllowContract() && DstFlags.hasAllowContract()) {
+      return N0.getOperand(0);
+    }
+  }
+
+  return SDValue();
+}
+
 SDValue DAGCombiner::visitFP_EXTEND(SDNode *N) {
+  SelectionDAG::FlagInserter FlagsInserter(DAG, N);
   SDValue N0 = N->getOperand(0);
   EVT VT = N->getValueType(0);
   SDLoc DL(N);
@@ -18507,6 +18545,9 @@ SDValue DAGCombiner::visitFP_EXTEND(SDNode *N) {
   if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
     return NewVSel;
 
+  if (SDValue CastEliminated = eliminateFPCastPair(N))
+    return CastEliminated;
+
   return SDValue();
 }
 
@@ -27209,6 +27250,7 @@ SDValue DAGCombiner::visitFP_TO_FP16(SDNode *N) {
 }
 
 SDValue DAGCombiner::visitFP16_TO_FP(SDNode *N) {
+  SelectionDAG::FlagInserter FlagsInserter(DAG, N);
   auto Op = N->getOpcode();
   assert((Op == ISD::FP16_TO_FP || Op == ISD::BF16_TO_FP) &&
          "opcode should be FP16_TO_FP or BF16_TO_FP.");
@@ -27223,6 +27265,9 @@ SDValue DAGCombiner::visitFP16_TO_FP(SDNode *N) {
     }
   }
 
+  if (SDValue CastEliminated = eliminateFPCastPair(N))
+    return CastEliminated;
+
   // Sometimes constants manage to survive very late in the pipeline, e.g.,
   // because they are wrapped inside the <1 x f16> type. Try one last time to
   // get rid of them.
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index 5460a376931a5..adc536da26f26 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -84,11 +84,8 @@ define half @test_fmadd(half %a, half %b, half %c) #0 {
 ; CHECK-CVT-SD:       // %bb.0:
 ; CHECK-CVT-SD-NEXT:    fcvt s1, h1
 ; CHECK-CVT-SD-NEXT:    fcvt s0, h0
-; CHECK-CVT-SD-NEXT:    fmul s0, s0, s1
-; CHECK-CVT-SD-NEXT:    fcvt s1, h2
-; CHECK-CVT-SD-NEXT:    fcvt h0, s0
-; CHECK-CVT-SD-NEXT:    fcvt s0, h0
-; CHECK-CVT-SD-NEXT:    fadd s0, s0, s1
+; CHECK-CVT-SD-NEXT:    fcvt s2, h2
+; CHECK-CVT-SD-NEXT:    fmadd s0, s0, s1, s2
 ; CHECK-CVT-SD-NEXT:    fcvt h0, s0
 ; CHECK-CVT-SD-NEXT:    ret
 ;
@@ -1248,6 +1245,15 @@ define half @test_atan(half %a) #0 {
 }
 
 define half @test_atan2(half %a, half %b) #0 {
+; CHECK-LABEL: test_atan2:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    fcvt s0, h0
+; CHECK-NEXT:    fcvt s1, h1
+; CHECK-NEXT:    bl atan2f
+; CHECK-NEXT:    fcvt h0, s0
+; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
   %r = call half @llvm.atan2.f16(half %a, half %b)
   ret half %r
 }
diff --git a/llvm/test/CodeGen/AArch64/fmla.ll b/llvm/test/CodeGen/AArch64/fmla.ll
index 7bcaae5a77eac..a37aabb0b5384 100644
--- a/llvm/test/CodeGen/AArch64/fmla.ll
+++ b/llvm/test/CodeGen/AArch64/fmla.ll
@@ -1114,11 +1114,8 @@ define half @fmul_f16(half %a, half %b, half %c) {
 ; CHECK-SD-NOFP16:       // %bb.0: // %entry
 ; CHECK-SD-NOFP16-NEXT:    fcvt s1, h1
 ; CHECK-SD-NOFP16-NEXT:    fcvt s0, h0
-; CHECK-SD-NOFP16-NEXT:    fmul s0, s0, s1
-; CHECK-SD-NOFP16-NEXT:    fcvt s1, h2
-; CHECK-SD-NOFP16-NEXT:    fcvt h0, s0
-; CHECK-SD-NOFP16-NEXT:    fcvt s0, h0
-; CHECK-SD-NOFP16-NEXT:    fadd s0, s0, s1
+; CHECK-SD-NOFP16-NEXT:    fcvt s2, h2
+; CHECK-SD-NOFP16-NEXT:    fmadd s0, s0, s1, s2
 ; CHECK-SD-NOFP16-NEXT:    fcvt h0, s0
 ; CHECK-SD-NOFP16-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/AArch64/fp16_fast_math.ll b/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
index b7d2de708a110..7d9654d1ff8c0 100644
--- a/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
+++ b/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
@@ -88,3 +88,112 @@ entry:
   %add = fadd ninf half %x, %y
   ret half %add
 }
+
+; Check that when we have the right fast math flags the converts in between the
+; two fadds are removed.
+
+define half @normal_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: normal_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = nofpexcept FCVTHSr killed [[FADDSrr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr killed [[FCVTHSr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr3:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = nofpexcept FADDSrr killed [[FCVTSHr2]], killed [[FCVTSHr3]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr1:%[0-9]+]]:fpr16 = nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr1]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: normal_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd half %x, %y
+  %add2 = fadd half %add1, %z
+  ret half %add2
+}
+
+define half @nnan_ninf_contract_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: nnan_ninf_contract_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FADDSrr killed [[FADDSrr]], killed [[FCVTSHr2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: nnan_ninf_contract_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd nnan ninf contract half %x, %y
+  %add2 = fadd nnan ninf contract half %add1, %z
+  ret half %add2
+}
+
+define half @ninf_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: ninf_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = ninf nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = ninf nofpexcept FCVTHSr killed [[FADDSrr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr killed [[FCVTHSr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr3:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = ninf nofpexcept FADDSrr killed [[FCVTSHr2]], killed [[FCVTSHr3]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr1:%[0-9]+]]:fpr16 = ninf nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr1]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: ninf_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = ninf nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = ninf nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd ninf half %x, %y
+  %add2 = fadd ninf half %add1, %z
+  ret half %add2
+}
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
index 4eaaee7ce5055..95ca0a68a7212 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
@@ -443,23 +443,17 @@ define half @faddv_v4f16(half %start, <4 x half> %a) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
 ; NONEON-NOSVE-NEXT:    str d1, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s0, h0
-; NONEON-NOSVE-NEXT:    ldr h1, [sp, #8]
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #12]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
+; NONEON-NOSVE-NEXT:    ldr h1, [sp, #10]
 ; NONEON-NOSVE-NEXT:    ldr h2, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
+; NONEON-NOSVE-NEXT:    ldr h3, [sp, #12]
+; NONEON-NOSVE-NEXT:    ldr h4, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s2
+; NONEON-NOSVE-NEXT:    fadd s1, s4, s1
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #16
@@ -481,44 +475,30 @@ define half @faddv_v8f16(half %start, <8 x half> %a) {
 ; NONEON-NOSVE:       // %bb.0:
 ; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr h1, [sp]
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #2]
-; NONEON-NOSVE-NEXT:    fcvt s0, h0
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #4]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
+; NONEON-NOSVE-NEXT:    ldr h1, [sp, #2]
 ; NONEON-NOSVE-NEXT:    ldr h2, [sp, #6]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #8]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
+; NONEON-NOSVE-NEXT:    fcvt s0, h0
+; NONEON-NOSVE-NEXT:    ldr h3, [sp, #4]
+; NONEON-NOSVE-NEXT:    ldr h4, [sp]
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #10]
+; NONEON-NOSVE-NEXT:    ldr h6, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #12]
 ; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    fcvt s5, h5
+; NONEON-NOSVE-NEXT:    fcvt s6, h6
+; NONEON-NOSVE-NEXT:    ldr h7, [sp, #12]
+; NONEON-NOSVE-NEXT:    fadd s1, s4, s1
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s2
+; NONEON-NOSVE-NEXT:    fcvt s3, h7
+; NONEON-NOSVE-NEXT:    fadd s4, s6, s5
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #14]
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
+; NONEON-NOSVE-NEXT:    fcvt s3, h5
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s0, s0, s3
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #16
@@ -547,79 +527,49 @@ define half @faddv_v16f16(half %start, ptr %a) {
 ; NONEON-NOSVE-NEXT:    fcvt s0, h0
 ; NONEON-NOSVE-NEXT:    ldr h3, [sp, #16]
 ; NONEON-NOSVE-NEXT:    ldr h4, [sp]
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #20]
+; NONEON-NOSVE-NEXT:    ldr h6, [sp, #4]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
 ; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    ldr h7, [sp, #22]
+; NONEON-NOSVE-NEXT:    ldr h16, [sp, #6]
 ; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    ldr h17, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldr h18, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    ldr h19, [sp, #26]
+; NONEON-NOSVE-NEXT:    ldr h20, [sp, #10]
+; NONEON-NOSVE-NEXT:    fcvt s5, h5
+; NONEON-NOSVE-NEXT:    fcvt s6, h6
+; NONEON-NOSVE-NEXT:    fcvt s7, h7
+; NONEON-NOSVE-NEXT:    fcvt s16, h16
+; NONEON-NOSVE-NEXT:    fcvt s17, h17
+; NONEON-NOSVE-NEXT:    fcvt s18, h18
+; NONEON-NOSVE-NEXT:    fcvt s19, h19
+; NONEON-NOSVE-NEXT:    fcvt s20, h20
+; NONEON-NOSVE-NEXT:    ldr h21, [sp, #28]
+; NONEON-NOSVE-NEXT:    ldr h22, [sp, #12]
 ; NONEON-NOSVE-NEXT:    fadd s1, s2, s1
 ; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #20]
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #4]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #6]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fadd s3, s6, s5
+; NONEON-NOSVE-NEXT:    fadd s4, s16, s7
+; NONEON-NOSVE-NEXT:    fcvt s5, h21
+; NONEON-NOSVE-NEXT:    fcvt s6, h22
+; NONEON-NOSVE-NEXT:    fadd s7, s18, s17
+; NONEON-NOSVE-NEXT:    ldr h17, [sp, #30]
+; NONEON-NOSVE-NEXT:    fadd s16, s20, s19
+; NONEON-NOSVE-NEXT:    ldr h18, [sp, #14]
 ; NONEON-NOSVE-NEXT:    fadd s1, s2, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #22]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #8]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #24]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #26]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #12]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #28]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #30]
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s4
+; NONEON-NOSVE-NEXT:    fcvt s4, h17
+; NONEON-NOSVE-NEXT:    fadd s5, s6, s5
+; NONEON-NOSVE-NEXT:    fcvt s6, h18
+; NONEON-NOSVE-NEXT:    fadd s3, s7, s16
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s5
+; NONEON-NOSVE-NEXT:    fadd s3, s6, s4
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s0, s0, s3
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #32
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-fadd.ll b/llvm/test/CodeGen/AArch64/vecreduce-fadd.ll
index 03db1d0d433d3..11ce20f109623 100644
--- a/llvm/te...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: John Brawn (john-brawn-arm)

Changes

When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.


Patch is 63.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131345.diff

15 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+45)
  • (modified) llvm/test/CodeGen/AArch64/f16-instructions.ll (+11-5)
  • (modified) llvm/test/CodeGen/AArch64/fmla.ll (+2-5)
  • (modified) llvm/test/CodeGen/AArch64/fp16_fast_math.ll (+109)
  • (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll (+61-111)
  • (modified) llvm/test/CodeGen/AArch64/vecreduce-fadd.ll (+78-138)
  • (modified) llvm/test/CodeGen/AArch64/vecreduce-fmul.ll (+54-94)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.ll (-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+4-24)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+4-24)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (-20)
  • (modified) llvm/test/CodeGen/ARM/fp16_fast_math.ll (+143-6)
  • (modified) llvm/test/CodeGen/Thumb2/bf16-instructions.ll (+8-17)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c35838601cc9c..c16e2f0bc865b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18455,7 +18455,45 @@ SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
   return SDValue();
 }
 
+// Eliminate a floating-point widening of a narrowed value if the fast math
+// flags allow it.
+static SDValue eliminateFPCastPair(SDNode *N) {
+  SDValue N0 = N->getOperand(0);
+  EVT VT = N->getValueType(0);
+
+  unsigned NarrowingOp;
+  switch (N->getOpcode()) {
+  case ISD::FP16_TO_FP:
+    NarrowingOp = ISD::FP_TO_FP16;
+    break;
+  case ISD::BF16_TO_FP:
+    NarrowingOp = ISD::FP_TO_BF16;
+    break;
+  case ISD::FP_EXTEND:
+    NarrowingOp = ISD::FP_ROUND;
+    break;
+  default:
+    llvm_unreachable("Expected widening FP cast");
+  }
+
+  if (N0.getOpcode() == NarrowingOp && N0.getOperand(0).getValueType() == VT) {
+    const SDNodeFlags SrcFlags = N0->getFlags();
+    const SDNodeFlags DstFlags = N->getFlags();
+    // Narrowing can introduce inf and change the encoding of a nan, so the
+    // destination must have the nnan and ninf flags to indicate that we don't
+    // need to care about that. We are also removing a rounding step, and that
+    // requires both the source and destination to allow contraction.
+    if (DstFlags.hasNoNaNs() && DstFlags.hasNoInfs() &&
+        SrcFlags.hasAllowContract() && DstFlags.hasAllowContract()) {
+      return N0.getOperand(0);
+    }
+  }
+
+  return SDValue();
+}
+
 SDValue DAGCombiner::visitFP_EXTEND(SDNode *N) {
+  SelectionDAG::FlagInserter FlagsInserter(DAG, N);
   SDValue N0 = N->getOperand(0);
   EVT VT = N->getValueType(0);
   SDLoc DL(N);
@@ -18507,6 +18545,9 @@ SDValue DAGCombiner::visitFP_EXTEND(SDNode *N) {
   if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
     return NewVSel;
 
+  if (SDValue CastEliminated = eliminateFPCastPair(N))
+    return CastEliminated;
+
   return SDValue();
 }
 
@@ -27209,6 +27250,7 @@ SDValue DAGCombiner::visitFP_TO_FP16(SDNode *N) {
 }
 
 SDValue DAGCombiner::visitFP16_TO_FP(SDNode *N) {
+  SelectionDAG::FlagInserter FlagsInserter(DAG, N);
   auto Op = N->getOpcode();
   assert((Op == ISD::FP16_TO_FP || Op == ISD::BF16_TO_FP) &&
          "opcode should be FP16_TO_FP or BF16_TO_FP.");
@@ -27223,6 +27265,9 @@ SDValue DAGCombiner::visitFP16_TO_FP(SDNode *N) {
     }
   }
 
+  if (SDValue CastEliminated = eliminateFPCastPair(N))
+    return CastEliminated;
+
   // Sometimes constants manage to survive very late in the pipeline, e.g.,
   // because they are wrapped inside the <1 x f16> type. Try one last time to
   // get rid of them.
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index 5460a376931a5..adc536da26f26 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -84,11 +84,8 @@ define half @test_fmadd(half %a, half %b, half %c) #0 {
 ; CHECK-CVT-SD:       // %bb.0:
 ; CHECK-CVT-SD-NEXT:    fcvt s1, h1
 ; CHECK-CVT-SD-NEXT:    fcvt s0, h0
-; CHECK-CVT-SD-NEXT:    fmul s0, s0, s1
-; CHECK-CVT-SD-NEXT:    fcvt s1, h2
-; CHECK-CVT-SD-NEXT:    fcvt h0, s0
-; CHECK-CVT-SD-NEXT:    fcvt s0, h0
-; CHECK-CVT-SD-NEXT:    fadd s0, s0, s1
+; CHECK-CVT-SD-NEXT:    fcvt s2, h2
+; CHECK-CVT-SD-NEXT:    fmadd s0, s0, s1, s2
 ; CHECK-CVT-SD-NEXT:    fcvt h0, s0
 ; CHECK-CVT-SD-NEXT:    ret
 ;
@@ -1248,6 +1245,15 @@ define half @test_atan(half %a) #0 {
 }
 
 define half @test_atan2(half %a, half %b) #0 {
+; CHECK-LABEL: test_atan2:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    fcvt s0, h0
+; CHECK-NEXT:    fcvt s1, h1
+; CHECK-NEXT:    bl atan2f
+; CHECK-NEXT:    fcvt h0, s0
+; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
   %r = call half @llvm.atan2.f16(half %a, half %b)
   ret half %r
 }
diff --git a/llvm/test/CodeGen/AArch64/fmla.ll b/llvm/test/CodeGen/AArch64/fmla.ll
index 7bcaae5a77eac..a37aabb0b5384 100644
--- a/llvm/test/CodeGen/AArch64/fmla.ll
+++ b/llvm/test/CodeGen/AArch64/fmla.ll
@@ -1114,11 +1114,8 @@ define half @fmul_f16(half %a, half %b, half %c) {
 ; CHECK-SD-NOFP16:       // %bb.0: // %entry
 ; CHECK-SD-NOFP16-NEXT:    fcvt s1, h1
 ; CHECK-SD-NOFP16-NEXT:    fcvt s0, h0
-; CHECK-SD-NOFP16-NEXT:    fmul s0, s0, s1
-; CHECK-SD-NOFP16-NEXT:    fcvt s1, h2
-; CHECK-SD-NOFP16-NEXT:    fcvt h0, s0
-; CHECK-SD-NOFP16-NEXT:    fcvt s0, h0
-; CHECK-SD-NOFP16-NEXT:    fadd s0, s0, s1
+; CHECK-SD-NOFP16-NEXT:    fcvt s2, h2
+; CHECK-SD-NOFP16-NEXT:    fmadd s0, s0, s1, s2
 ; CHECK-SD-NOFP16-NEXT:    fcvt h0, s0
 ; CHECK-SD-NOFP16-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/AArch64/fp16_fast_math.ll b/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
index b7d2de708a110..7d9654d1ff8c0 100644
--- a/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
+++ b/llvm/test/CodeGen/AArch64/fp16_fast_math.ll
@@ -88,3 +88,112 @@ entry:
   %add = fadd ninf half %x, %y
   ret half %add
 }
+
+; Check that when we have the right fast math flags the converts in between the
+; two fadds are removed.
+
+define half @normal_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: normal_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = nofpexcept FCVTHSr killed [[FADDSrr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr killed [[FCVTHSr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr3:%[0-9]+]]:fpr32 = nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = nofpexcept FADDSrr killed [[FCVTSHr2]], killed [[FCVTSHr3]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr1:%[0-9]+]]:fpr16 = nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr1]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: normal_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd half %x, %y
+  %add2 = fadd half %add1, %z
+  ret half %add2
+}
+
+define half @nnan_ninf_contract_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: nnan_ninf_contract_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = nnan ninf contract nofpexcept FADDSrr killed [[FADDSrr]], killed [[FCVTSHr2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: nnan_ninf_contract_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = nnan ninf contract nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd nnan ninf contract half %x, %y
+  %add2 = fadd nnan ninf contract half %add1, %z
+  ret half %add2
+}
+
+define half @ninf_fadd_sequence(half %x, half %y, half %z) {
+  ; CHECK-CVT-LABEL: name: ninf_fadd_sequence
+  ; CHECK-CVT: bb.0.entry:
+  ; CHECK-CVT-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-CVT-NEXT: {{  $}}
+  ; CHECK-CVT-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-CVT-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-CVT-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-CVT-NEXT:   [[FCVTSHr:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr1:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY2]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr:%[0-9]+]]:fpr32 = ninf nofpexcept FADDSrr killed [[FCVTSHr1]], killed [[FCVTSHr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr:%[0-9]+]]:fpr16 = ninf nofpexcept FCVTHSr killed [[FADDSrr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr2:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr killed [[FCVTHSr]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTSHr3:%[0-9]+]]:fpr32 = ninf nofpexcept FCVTSHr [[COPY]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FADDSrr1:%[0-9]+]]:fpr32 = ninf nofpexcept FADDSrr killed [[FCVTSHr2]], killed [[FCVTSHr3]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   [[FCVTHSr1:%[0-9]+]]:fpr16 = ninf nofpexcept FCVTHSr killed [[FADDSrr1]], implicit $fpcr
+  ; CHECK-CVT-NEXT:   $h0 = COPY [[FCVTHSr1]]
+  ; CHECK-CVT-NEXT:   RET_ReallyLR implicit $h0
+  ;
+  ; CHECK-FP16-LABEL: name: ninf_fadd_sequence
+  ; CHECK-FP16: bb.0.entry:
+  ; CHECK-FP16-NEXT:   liveins: $h0, $h1, $h2
+  ; CHECK-FP16-NEXT: {{  $}}
+  ; CHECK-FP16-NEXT:   [[COPY:%[0-9]+]]:fpr16 = COPY $h2
+  ; CHECK-FP16-NEXT:   [[COPY1:%[0-9]+]]:fpr16 = COPY $h1
+  ; CHECK-FP16-NEXT:   [[COPY2:%[0-9]+]]:fpr16 = COPY $h0
+  ; CHECK-FP16-NEXT:   [[FADDHrr:%[0-9]+]]:fpr16 = ninf nofpexcept FADDHrr [[COPY2]], [[COPY1]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   [[FADDHrr1:%[0-9]+]]:fpr16 = ninf nofpexcept FADDHrr killed [[FADDHrr]], [[COPY]], implicit $fpcr
+  ; CHECK-FP16-NEXT:   $h0 = COPY [[FADDHrr1]]
+  ; CHECK-FP16-NEXT:   RET_ReallyLR implicit $h0
+entry:
+  %add1 = fadd ninf half %x, %y
+  %add2 = fadd ninf half %add1, %z
+  ret half %add2
+}
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
index 4eaaee7ce5055..95ca0a68a7212 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
@@ -443,23 +443,17 @@ define half @faddv_v4f16(half %start, <4 x half> %a) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
 ; NONEON-NOSVE-NEXT:    str d1, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s0, h0
-; NONEON-NOSVE-NEXT:    ldr h1, [sp, #8]
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #12]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
+; NONEON-NOSVE-NEXT:    ldr h1, [sp, #10]
 ; NONEON-NOSVE-NEXT:    ldr h2, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
+; NONEON-NOSVE-NEXT:    ldr h3, [sp, #12]
+; NONEON-NOSVE-NEXT:    ldr h4, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s2
+; NONEON-NOSVE-NEXT:    fadd s1, s4, s1
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #16
@@ -481,44 +475,30 @@ define half @faddv_v8f16(half %start, <8 x half> %a) {
 ; NONEON-NOSVE:       // %bb.0:
 ; NONEON-NOSVE-NEXT:    str q1, [sp, #-16]!
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr h1, [sp]
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #2]
-; NONEON-NOSVE-NEXT:    fcvt s0, h0
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #4]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
+; NONEON-NOSVE-NEXT:    ldr h1, [sp, #2]
 ; NONEON-NOSVE-NEXT:    ldr h2, [sp, #6]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #8]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
+; NONEON-NOSVE-NEXT:    fcvt s0, h0
+; NONEON-NOSVE-NEXT:    ldr h3, [sp, #4]
+; NONEON-NOSVE-NEXT:    ldr h4, [sp]
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #10]
+; NONEON-NOSVE-NEXT:    ldr h6, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #12]
 ; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    fcvt s5, h5
+; NONEON-NOSVE-NEXT:    fcvt s6, h6
+; NONEON-NOSVE-NEXT:    ldr h7, [sp, #12]
+; NONEON-NOSVE-NEXT:    fadd s1, s4, s1
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s2
+; NONEON-NOSVE-NEXT:    fcvt s3, h7
+; NONEON-NOSVE-NEXT:    fadd s4, s6, s5
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #14]
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    ldr h2, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
+; NONEON-NOSVE-NEXT:    fcvt s3, h5
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s0, s0, s3
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #16
@@ -547,79 +527,49 @@ define half @faddv_v16f16(half %start, ptr %a) {
 ; NONEON-NOSVE-NEXT:    fcvt s0, h0
 ; NONEON-NOSVE-NEXT:    ldr h3, [sp, #16]
 ; NONEON-NOSVE-NEXT:    ldr h4, [sp]
+; NONEON-NOSVE-NEXT:    ldr h5, [sp, #20]
+; NONEON-NOSVE-NEXT:    ldr h6, [sp, #4]
 ; NONEON-NOSVE-NEXT:    fcvt s1, h1
 ; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    ldr h7, [sp, #22]
+; NONEON-NOSVE-NEXT:    ldr h16, [sp, #6]
 ; NONEON-NOSVE-NEXT:    fcvt s3, h3
+; NONEON-NOSVE-NEXT:    ldr h17, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldr h18, [sp, #8]
 ; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    ldr h19, [sp, #26]
+; NONEON-NOSVE-NEXT:    ldr h20, [sp, #10]
+; NONEON-NOSVE-NEXT:    fcvt s5, h5
+; NONEON-NOSVE-NEXT:    fcvt s6, h6
+; NONEON-NOSVE-NEXT:    fcvt s7, h7
+; NONEON-NOSVE-NEXT:    fcvt s16, h16
+; NONEON-NOSVE-NEXT:    fcvt s17, h17
+; NONEON-NOSVE-NEXT:    fcvt s18, h18
+; NONEON-NOSVE-NEXT:    fcvt s19, h19
+; NONEON-NOSVE-NEXT:    fcvt s20, h20
+; NONEON-NOSVE-NEXT:    ldr h21, [sp, #28]
+; NONEON-NOSVE-NEXT:    ldr h22, [sp, #12]
 ; NONEON-NOSVE-NEXT:    fadd s1, s2, s1
 ; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #20]
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #4]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #6]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
+; NONEON-NOSVE-NEXT:    fadd s3, s6, s5
+; NONEON-NOSVE-NEXT:    fadd s4, s16, s7
+; NONEON-NOSVE-NEXT:    fcvt s5, h21
+; NONEON-NOSVE-NEXT:    fcvt s6, h22
+; NONEON-NOSVE-NEXT:    fadd s7, s18, s17
+; NONEON-NOSVE-NEXT:    ldr h17, [sp, #30]
+; NONEON-NOSVE-NEXT:    fadd s16, s20, s19
+; NONEON-NOSVE-NEXT:    ldr h18, [sp, #14]
 ; NONEON-NOSVE-NEXT:    fadd s1, s2, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #22]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #8]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #24]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #10]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #26]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fadd s3, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #12]
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h2, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #28]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    ldr h3, [sp, #30]
-; NONEON-NOSVE-NEXT:    ldr h4, [sp, #14]
-; NONEON-NOSVE-NEXT:    fcvt s3, h3
-; NONEON-NOSVE-NEXT:    fcvt s4, h4
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s4
+; NONEON-NOSVE-NEXT:    fcvt s4, h17
+; NONEON-NOSVE-NEXT:    fadd s5, s6, s5
+; NONEON-NOSVE-NEXT:    fcvt s6, h18
+; NONEON-NOSVE-NEXT:    fadd s3, s7, s16
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fadd s2, s4, s3
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt h2, s2
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
-; NONEON-NOSVE-NEXT:    fcvt s2, h2
+; NONEON-NOSVE-NEXT:    fadd s2, s3, s5
+; NONEON-NOSVE-NEXT:    fadd s3, s6, s4
 ; NONEON-NOSVE-NEXT:    fadd s1, s1, s2
-; NONEON-NOSVE-NEXT:    fcvt h1, s1
-; NONEON-NOSVE-NEXT:    fcvt s1, h1
+; NONEON-NOSVE-NEXT:    fadd s0, s0, s3
 ; NONEON-NOSVE-NEXT:    fadd s0, s0, s1
 ; NONEON-NOSVE-NEXT:    fcvt h0, s0
 ; NONEON-NOSVE-NEXT:    add sp, sp, #32
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-fadd.ll b/llvm/test/CodeGen/AArch64/vecreduce-fadd.ll
index 03db1d0d433d3..11ce20f109623 100644
--- a/llvm/te...
[truncated]

@john-brawn-arm
Copy link
Collaborator Author

Ping

Copy link
Contributor

@hvdijk hvdijk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear documentation suggests that fast implies only nnan, ninf, nsz because those are the only fast math flags, when remaining fast math flags are documented further below. (Edited comment, I initially got this wrong.) It shouldn't prevent this going in either way, but it would be useful to have clearer documentation.

Comment on lines +18464 to +18477
unsigned NarrowingOp;
switch (N->getOpcode()) {
case ISD::FP16_TO_FP:
NarrowingOp = ISD::FP_TO_FP16;
break;
case ISD::BF16_TO_FP:
NarrowingOp = ISD::FP_TO_BF16;
break;
case ISD::FP_EXTEND:
NarrowingOp = ISD::FP_ROUND;
break;
default:
llvm_unreachable("Expected widening FP cast");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a switch/assign/break. Surely there's a utility function for this somewhere already?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be one as far as I can tell. I could move this out into a separate function like e.g. ISD::getInverseMinMaxOpcode, but that would just be moving the switch to a separate location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could change each call site to pass in NarrowingOp as an argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That then means visitFP16_TO_FP needs that same switch since despite the name, it is not only called for ISD::FP16_TO_FP. That doesn't get rid of the switch, that just moves it.

Comment on lines +18466 to +18471
case ISD::FP16_TO_FP:
NarrowingOp = ISD::FP_TO_FP16;
break;
case ISD::BF16_TO_FP:
NarrowingOp = ISD::FP_TO_BF16;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave these cases for a separate patch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for doing it that way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these cases are a pain, and it's not obvious to me this is even tested in the patch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP16_TO_FP is tested in llvm/test/CodeGen/ARM/fp16_fast_math.ll - ARM uses FP16_TO_FP for fp16-to-float conversion as it doesn't have registers for fp16 types, AArch64 uses FP_EXTEND as it does have such registers. We don't have any BF16 tests though, so I've now added these.

@john-brawn-arm
Copy link
Collaborator Author

Ping.

@john-brawn-arm john-brawn-arm merged commit dd87127 into llvm:main Apr 28, 2025
11 checks passed
@john-brawn-arm john-brawn-arm deleted the fp16_dagcombine branch April 28, 2025 10:26
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#131345)

When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#131345)

When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#131345)

When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…llvm#131345)

When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants