[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum #80697

arcbbb · 2024-02-05T16:02:35Z

The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs if any element of the vector is a NaN.
Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)

llvmbot · 2024-02-05T16:03:06Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-risc-v

Author: Shih-Po Hung (arcbbb)

Changes

The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs. and if any element of the vector is a NaN, the result is NaN.

RVV handles this by continuously dividing the vector until only one remains.
This patch estimates the cost in each division, where LMUL may vary.

Full diff: https://github.com/llvm/llvm-project/pull/80697.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+33)
(modified) llvm/test/Analysis/CostModel/RISCV/reduce-fmaximum.ll (+26-26)
(modified) llvm/test/Analysis/CostModel/RISCV/reduce-fminimum.ll (+26-26)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index d1db47a6061e4e..fe934a72ebcb04 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -969,6 +969,39 @@ RISCVTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
       return getArithmeticReductionCost(Instruction::And, Ty, FMF, CostKind);
   }
 
+  if (IID == Intrinsic::maximum || IID == Intrinsic::minimum) {
+    if (LT.second.isScalableVector())
+      return InstructionCost::getInvalid();
+    // Following TargetLowering::expandVecReduce
+    // Example sequences to reduce v8f32 into v4f32
+    //   vsetivli zero, 4, e32, m2, ta, ma
+    //   vslidedown.vi v12, v10, 4
+    //   vsetivli zero, 4, e32, m1, ta, ma
+    //   vmfeq.vv v0, v12, v12
+    //   vmfeq.vv v8, v10, v10
+    //   vmerge.vvm v9, v12, v10, v0
+    //   vmv.v.v v0, v8
+    //   vmerge.vvm v8, v10, v12, v0
+    //   vfmin.vv v9, v9, v8
+    MVT SubTy = LT.second;
+    unsigned ReduceOp =
+        IID == Intrinsic::maximum ? RISCV::VFMAX_VV : RISCV::VFMIN_VV;
+    unsigned Opcodes[] = {RISCV::VSLIDEDOWN_VI,
+                          RISCV::VMFEQ_VV,
+                          RISCV::VMFEQ_VV,
+                          RISCV::VMERGE_VVM,
+                          RISCV::VMV1R_V,
+                          RISCV::VMERGE_VVM,
+                          ReduceOp};
+    InstructionCost SplitCost = 0;
+    while (SubTy.getVectorNumElements() > 1) {
+      SubTy = SubTy.getHalfNumVectorElementsVT();
+      SplitCost += getRISCVInstructionCost(Opcodes, SubTy, CostKind);
+    }
+    return LT.first * SplitCost +
+           getRISCVInstructionCost({RISCV::VFMV_F_S}, LT.second, CostKind);
+  }
+
   // IR Reduction is composed by two vmv and one rvv reduction instruction.
   InstructionCost BaseCost = 2;
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-fmaximum.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-fmaximum.ll
index 1618c3833a9722..ea7c2c38a82a73 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-fmaximum.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-fmaximum.ll
@@ -6,23 +6,23 @@
 
 define float @reduce_fmaximum_f32(float %arg) {
 ; CHECK-LABEL: 'reduce_fmaximum_f32'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call float @llvm.vector.reduce.fmaximum.v2f32(<2 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call float @llvm.vector.reduce.fmaximum.v4f32(<4 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call float @llvm.vector.reduce.fmaximum.v8f32(<8 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V16 = call float @llvm.vector.reduce.fmaximum.v16f32(<16 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V32 = call float @llvm.vector.reduce.fmaximum.v32f32(<32 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V64 = call float @llvm.vector.reduce.fmaximum.v64f32(<64 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V128 = call float @llvm.vector.reduce.fmaximum.v128f32(<128 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call float @llvm.vector.reduce.fmaximum.v2f32(<2 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call float @llvm.vector.reduce.fmaximum.v4f32(<4 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call float @llvm.vector.reduce.fmaximum.v8f32(<8 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call float @llvm.vector.reduce.fmaximum.v16f32(<16 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %V32 = call float @llvm.vector.reduce.fmaximum.v32f32(<32 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %V64 = call float @llvm.vector.reduce.fmaximum.v64f32(<64 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 141 for instruction: %V128 = call float @llvm.vector.reduce.fmaximum.v128f32(<128 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret float undef
 ;
 ; SIZE-LABEL: 'reduce_fmaximum_f32'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call float @llvm.vector.reduce.fmaximum.v2f32(<2 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4 = call float @llvm.vector.reduce.fmaximum.v4f32(<4 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8 = call float @llvm.vector.reduce.fmaximum.v8f32(<8 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16 = call float @llvm.vector.reduce.fmaximum.v16f32(<16 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V32 = call float @llvm.vector.reduce.fmaximum.v32f32(<32 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V64 = call float @llvm.vector.reduce.fmaximum.v64f32(<64 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V128 = call float @llvm.vector.reduce.fmaximum.v128f32(<128 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call float @llvm.vector.reduce.fmaximum.v2f32(<2 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call float @llvm.vector.reduce.fmaximum.v4f32(<4 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call float @llvm.vector.reduce.fmaximum.v8f32(<8 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call float @llvm.vector.reduce.fmaximum.v16f32(<16 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V32 = call float @llvm.vector.reduce.fmaximum.v32f32(<32 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %V64 = call float @llvm.vector.reduce.fmaximum.v64f32(<64 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 85 for instruction: %V128 = call float @llvm.vector.reduce.fmaximum.v128f32(<128 x float> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret float undef
 ;
 %V2   = call float @llvm.vector.reduce.fmaximum.v2f32(<2 x float> undef)
@@ -44,21 +44,21 @@ declare float @llvm.vector.reduce.fmaximum.v128f32(<128 x float>)
 
 define double @reduce_fmaximum_f64(double %arg) {
 ; CHECK-LABEL: 'reduce_fmaximum_f64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call double @llvm.vector.reduce.fmaximum.v2f64(<2 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call double @llvm.vector.reduce.fmaximum.v4f64(<4 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call double @llvm.vector.reduce.fmaximum.v8f64(<8 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V16 = call double @llvm.vector.reduce.fmaximum.v16f64(<16 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V32 = call double @llvm.vector.reduce.fmaximum.v32f64(<32 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %V64 = call double @llvm.vector.reduce.fmaximum.v64f64(<64 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call double @llvm.vector.reduce.fmaximum.v2f64(<2 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call double @llvm.vector.reduce.fmaximum.v4f64(<4 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call double @llvm.vector.reduce.fmaximum.v8f64(<8 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V16 = call double @llvm.vector.reduce.fmaximum.v16f64(<16 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V32 = call double @llvm.vector.reduce.fmaximum.v32f64(<32 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 127 for instruction: %V64 = call double @llvm.vector.reduce.fmaximum.v64f64(<64 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret double undef
 ;
 ; SIZE-LABEL: 'reduce_fmaximum_f64'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call double @llvm.vector.reduce.fmaximum.v2f64(<2 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4 = call double @llvm.vector.reduce.fmaximum.v4f64(<4 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8 = call double @llvm.vector.reduce.fmaximum.v8f64(<8 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16 = call double @llvm.vector.reduce.fmaximum.v16f64(<16 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V32 = call double @llvm.vector.reduce.fmaximum.v32f64(<32 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V64 = call double @llvm.vector.reduce.fmaximum.v64f64(<64 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call double @llvm.vector.reduce.fmaximum.v2f64(<2 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call double @llvm.vector.reduce.fmaximum.v4f64(<4 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call double @llvm.vector.reduce.fmaximum.v8f64(<8 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call double @llvm.vector.reduce.fmaximum.v16f64(<16 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V32 = call double @llvm.vector.reduce.fmaximum.v32f64(<32 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %V64 = call double @llvm.vector.reduce.fmaximum.v64f64(<64 x double> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret double undef
 ;
 %V2   = call double @llvm.vector.reduce.fmaximum.v2f64(<2 x double> undef)
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-fminimum.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-fminimum.ll
index 35b18645b1f2de..d74906c77cf9e9 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-fminimum.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-fminimum.ll
@@ -6,23 +6,23 @@
 
 define float @reduce_fmaximum_f32(float %arg) {
 ; CHECK-LABEL: 'reduce_fmaximum_f32'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call float @llvm.vector.reduce.fminimum.v2f32(<2 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call float @llvm.vector.reduce.fminimum.v4f32(<4 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call float @llvm.vector.reduce.fminimum.v8f32(<8 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V16 = call float @llvm.vector.reduce.fminimum.v16f32(<16 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V32 = call float @llvm.vector.reduce.fminimum.v32f32(<32 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V64 = call float @llvm.vector.reduce.fminimum.v64f32(<64 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V128 = call float @llvm.vector.reduce.fminimum.v128f32(<128 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call float @llvm.vector.reduce.fminimum.v2f32(<2 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call float @llvm.vector.reduce.fminimum.v4f32(<4 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call float @llvm.vector.reduce.fminimum.v8f32(<8 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call float @llvm.vector.reduce.fminimum.v16f32(<16 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %V32 = call float @llvm.vector.reduce.fminimum.v32f32(<32 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %V64 = call float @llvm.vector.reduce.fminimum.v64f32(<64 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 141 for instruction: %V128 = call float @llvm.vector.reduce.fminimum.v128f32(<128 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret float undef
 ;
 ; SIZE-LABEL: 'reduce_fmaximum_f32'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call float @llvm.vector.reduce.fminimum.v2f32(<2 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4 = call float @llvm.vector.reduce.fminimum.v4f32(<4 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8 = call float @llvm.vector.reduce.fminimum.v8f32(<8 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16 = call float @llvm.vector.reduce.fminimum.v16f32(<16 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V32 = call float @llvm.vector.reduce.fminimum.v32f32(<32 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V64 = call float @llvm.vector.reduce.fminimum.v64f32(<64 x float> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V128 = call float @llvm.vector.reduce.fminimum.v128f32(<128 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call float @llvm.vector.reduce.fminimum.v2f32(<2 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call float @llvm.vector.reduce.fminimum.v4f32(<4 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call float @llvm.vector.reduce.fminimum.v8f32(<8 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call float @llvm.vector.reduce.fminimum.v16f32(<16 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V32 = call float @llvm.vector.reduce.fminimum.v32f32(<32 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %V64 = call float @llvm.vector.reduce.fminimum.v64f32(<64 x float> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 85 for instruction: %V128 = call float @llvm.vector.reduce.fminimum.v128f32(<128 x float> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret float undef
 ;
 %V2   = call float @llvm.vector.reduce.fminimum.v2f32(<2 x float> undef)
@@ -44,21 +44,21 @@ declare float @llvm.vector.reduce.fminimum.v128f32(<128 x float>)
 
 define double @reduce_fmaximum_f64(double %arg) {
 ; CHECK-LABEL: 'reduce_fmaximum_f64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call double @llvm.vector.reduce.fminimum.v2f64(<2 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call double @llvm.vector.reduce.fminimum.v4f64(<4 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call double @llvm.vector.reduce.fminimum.v8f64(<8 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V16 = call double @llvm.vector.reduce.fminimum.v16f64(<16 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V32 = call double @llvm.vector.reduce.fminimum.v32f64(<32 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %V64 = call double @llvm.vector.reduce.fminimum.v64f64(<64 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call double @llvm.vector.reduce.fminimum.v2f64(<2 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call double @llvm.vector.reduce.fminimum.v4f64(<4 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call double @llvm.vector.reduce.fminimum.v8f64(<8 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V16 = call double @llvm.vector.reduce.fminimum.v16f64(<16 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V32 = call double @llvm.vector.reduce.fminimum.v32f64(<32 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 127 for instruction: %V64 = call double @llvm.vector.reduce.fminimum.v64f64(<64 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret double undef
 ;
 ; SIZE-LABEL: 'reduce_fmaximum_f64'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call double @llvm.vector.reduce.fminimum.v2f64(<2 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4 = call double @llvm.vector.reduce.fminimum.v4f64(<4 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8 = call double @llvm.vector.reduce.fminimum.v8f64(<8 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16 = call double @llvm.vector.reduce.fminimum.v16f64(<16 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V32 = call double @llvm.vector.reduce.fminimum.v32f64(<32 x double> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V64 = call double @llvm.vector.reduce.fminimum.v64f64(<64 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2 = call double @llvm.vector.reduce.fminimum.v2f64(<2 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %V4 = call double @llvm.vector.reduce.fminimum.v4f64(<4 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %V8 = call double @llvm.vector.reduce.fminimum.v8f64(<8 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %V16 = call double @llvm.vector.reduce.fminimum.v16f64(<16 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %V32 = call double @llvm.vector.reduce.fminimum.v32f64(<32 x double> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %V64 = call double @llvm.vector.reduce.fminimum.v64f64(<64 x double> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret double undef
 ;
 %V2   = call double @llvm.vector.reduce.fminimum.v2f64(<2 x double> undef)

arcbbb · 2024-02-06T06:57:49Z

#79402 will be re-committed after this.

lukel97 · 2024-02-08T09:32:11Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -90,6 +90,7 @@ RISCVTTIImpl::getRISCVInstructionCost(ArrayRef<unsigned> OpCodes, MVT VT,
    case RISCV::VFMV_S_F:
    case RISCV::VMNAND_MM:
    case RISCV::VCPOP_M:
+    case RISCV::VMV1R_V:


I think vmv1r.v is a register rename on some microarchitectures and so is almost free. Should this be costed as 0? Or do we have other places in TTI where we're already costing for it?

Thank you for pointing it out! I agree that it should default to zero for those micro-architectures. Fixed.

lukel97 · 2024-02-08T09:38:53Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+                          RISCV::VMFEQ_VV,
+                          RISCV::VMFEQ_VV,
+                          RISCV::VMERGE_VVM,
+                          RISCV::VMV1R_V,


Should this be RISCV::VMV_V_V?

Given that mask registers are always contained in a single vector register, VMV1R_V ensure the cost is regardless of LMUL.

Isn't the correct fix for that to use the mask type instead of the vector type in a call to getRISCVInstructionCost?

I forgot to mention that when the vsetvli's LMUL isn't m1 (like mf2, m2, m4), it would generate VMV1R_V for VMERGE_VVM.
With this, do you still prefer to have a separate call to getRISCVInstructionCost?

This appears to be a scheduling issue. If the second vmfeq moved below the first vmerge there wouldn't be a mv at all right?

Ah you are right. So I can just remove it.

github-actions · 2024-02-27T01:48:12Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arcbbb · 2024-03-04T02:45:42Z

ping

arcbbb · 2024-03-12T01:13:43Z

ping

preames · 2024-03-13T21:35:17Z

Not by any means an FP expert, but isn't there a massively better lowering available here?

Written as a vector pseudo language:

masknan = foreach lane in input, isnan(lane)
masknegzero  = foreach lane in input, isnegzero(lane)
scalar = reduce.fmin(input)
if anyof(masknan)
  scalar = v[clz(masknan);
if scalar == 0.0 and anyof(masknegzero)
  scalar = 0.0

If so, I'd be tempted to fix that before worrying about the costing in TTI.

topperc · 2024-03-13T21:52:33Z

Not by any means an FP expert, but isn't there a massively better lowering available here?

Written as a vector pseudo language:
masknan = foreach lane in input, isnan(lane)
masknegzero  = foreach lane in input, isnegzero(lane)
scalar = reduce.fmin(input)
if anyof(masknan)
  scalar = v[clz(masknan);
if scalar == 0.0 and anyof(masknegzero)
  scalar = 0.0
If so, I'd be tempted to fix that before worrying about the costing in TTI.

negzero shouldn't be an issue. I think our instructions respect the order of -0.0 and 0.0.

topperc · 2024-03-14T00:53:28Z

Not by any means an FP expert, but isn't there a massively better lowering available here?

Written as a vector pseudo language:
masknan = foreach lane in input, isnan(lane)
masknegzero  = foreach lane in input, isnegzero(lane)
scalar = reduce.fmin(input)
if anyof(masknan)
  scalar = v[clz(masknan);
if scalar == 0.0 and anyof(masknegzero)
  scalar = 0.0
If so, I'd be tempted to fix that before worrying about the costing in TTI.

I put together a patch for this. And fixed an issue related to FMF preservation in the existing expansion. I'll post soon.

The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs. and if any element of the vector is a NaN. Following llvm#79402, the patch add the cost of NaN check (vmfne + vcpop)

arcbbb · 2024-03-15T08:16:37Z

It is updated based on #85165, and follow the style in #79402.

preames · 2024-03-15T19:20:56Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -1001,6 +1001,53 @@ RISCVTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
      return getArithmeticReductionCost(Instruction::And, Ty, FMF, CostKind);
  }

+  if (IID == Intrinsic::maximum || IID == Intrinsic::minimum) {
+    SmallVector<unsigned, 5> SplitOps;


Can you subset this patch to the case where LT.first == 1? The splitting logic still looks too conservative here, but let's separate that discussion off.

preames · 2024-03-15T19:21:38Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+        SplitOps = {RISCV::VMFEQ_VV, RISCV::VMERGE_VVM, RISCV::VMFEQ_VV,
+                    RISCV::VMERGE_VVM, RISCV::VFMAX_VV};
+        Opcodes = {RISCV::VMFNE_VV, RISCV::VCPOP_M, RISCV::VFREDMAX_VS,
+                   RISCV::VFMV_F_S};


You're missing the cost of the scalar select here.

Fixed with getCFInstrCost.

Worth noting that we assume branches are predicted so getCFInstrCost(Instruction::Br, CostKind) will return 0 for throughput

preames · 2024-03-15T19:22:25Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+        // Cost of Canonical Nan
+        // lui a0, 523264
+        // fmv.w.x fa0, a0
+        ExtraCost = 2;


This depends on element type - a double is more expensive than the float. Can we use generic immediate costing here? (Not sure if we can - if not, 2 is not an unreasonable ballpark estimate.)

Fixed with getCastInstrCost.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

lukel97 · 2024-03-21T07:48:15Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+        Type *DstTy = Ty->getScalarType();
+        const unsigned EltTyBits = DL.getTypeSizeInBits(DstTy);


Nit, I think Ty->getScalarSizeInBits() does the same thing

lukel97 · 2024-03-21T07:57:01Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+        SplitOps = {RISCV::VMFEQ_VV, RISCV::VMERGE_VVM, RISCV::VMFEQ_VV,
+                    RISCV::VMERGE_VVM, RISCV::VFMAX_VV};
+        Opcodes = {RISCV::VMFNE_VV, RISCV::VCPOP_M, RISCV::VFREDMAX_VS,
+                   RISCV::VFMV_F_S};


Worth noting that we assume branches are predicted so getCFInstrCost(Instruction::Br, CostKind) will return 0 for throughput

This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697

This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in llvm#80553 and llvm#80697

arcbbb requested review from preames, lukel97 and topperc February 5, 2024 16:02

llvmbot added backend:RISC-V llvm:analysis labels Feb 5, 2024

arcbbb force-pushed the cost-reduce-maximum branch from dbcb041 to 8071215 Compare February 5, 2024 16:04

lukel97 reviewed Feb 8, 2024

View reviewed changes

arcbbb force-pushed the cost-reduce-maximum branch from c821d81 to c875fa9 Compare March 15, 2024 08:11

[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum

9eb0d55

The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs. and if any element of the vector is a NaN. Following llvm#79402, the patch add the cost of NaN check (vmfne + vcpop)

arcbbb force-pushed the cost-reduce-maximum branch from c875fa9 to 9eb0d55 Compare March 15, 2024 08:12

preames requested changes Mar 15, 2024

View reviewed changes

lukel97 reviewed Mar 18, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Address comments

00bd3c2

lukel97 approved these changes Mar 21, 2024

View reviewed changes

Use getScalarSizeInBits

145c3bb

arcbbb merged commit 3cb0241 into llvm:main Mar 25, 2024

arcbbb deleted the cost-reduce-maximum branch March 25, 2024 09:17

arcbbb mentioned this pull request Mar 25, 2024

Recommit "[RISCV] Refine cost on Min/Max reduction (#79402)" #86480

Merged

arcbbb added a commit that referenced this pull request Apr 1, 2024

Recommit "[RISCV] Refine cost on Min/Max reduction (#79402)" (#86480)

c7954ca

This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697

arcbbb added a commit to arcbbb/llvm-project that referenced this pull request Apr 1, 2024

Recommit "[RISCV] Refine cost on Min/Max reduction (llvm#79402)"

0700e64

This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in llvm#80553 and llvm#80697

		Type *DstTy = Ty->getScalarType();
		const unsigned EltTyBits = DL.getTypeSizeInBits(DstTy);

[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum #80697

[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum #80697

Uh oh!

Conversation

arcbbb commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Feb 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Mar 4, 2024

Uh oh!

arcbbb commented Mar 12, 2024

Uh oh!

preames commented Mar 13, 2024

Uh oh!

topperc commented Mar 13, 2024

Uh oh!

topperc commented Mar 14, 2024

Uh oh!

arcbbb commented Mar 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arcbbb commented Feb 5, 2024 •

edited

Loading

llvmbot commented Feb 5, 2024 •

edited

Loading

topperc Feb 26, 2024 •

edited

Loading

github-actions bot commented Feb 27, 2024 •

edited

Loading