-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first #76699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesCurrently vfmv.s.f intrinsics are directly selected to their pseudos via a This patch brings vfmv.s.f inline with the other move instructions, and shows Split out from #71501, where we did this to preserve the behaviour of selecting Full diff: https://github.com/llvm/llvm-project/pull/76699.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 51580d15451ca2..978b465f89c069 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8492,6 +8492,9 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(RISCVISD::VSELECT_VL, DL, VT, SelectCond, SplattedVal,
Vec, VL);
}
+ case Intrinsic::riscv_vfmv_s_f:
+ return DAG.getNode(RISCVISD::VFMV_S_F_VL, DL, Op.getSimpleValueType(),
+ Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
// EGS * EEW >= 128 bits
case Intrinsic::riscv_vaesdf_vv:
case Intrinsic::riscv_vaesdf_vs:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index be4bc3b58766ef..af044f9b0a7228 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -7399,32 +7399,6 @@ foreach vti = AllIntegerVectors in {
// vmv.s.x is handled with a custom node in RISCVInstrInfoVVLPatterns.td
}
-//===----------------------------------------------------------------------===//
-// 16.2. Floating-Point Scalar Move Instructions
-//===----------------------------------------------------------------------===//
-
-foreach fvti = AllFloatVectors in {
- let Predicates = GetVTypePredicates<fvti>.Predicates in {
- def : Pat<(fvti.Vector (int_riscv_vfmv_s_f (fvti.Vector fvti.RegClass:$rs1),
- (fvti.Scalar fvti.ScalarRegClass:$rs2), VLOpFrag)),
- (!cast<Instruction>("PseudoVFMV_S_"#fvti.ScalarSuffix#"_" #
- fvti.LMul.MX)
- (fvti.Vector $rs1),
- (fvti.Scalar fvti.ScalarRegClass:$rs2),
- GPR:$vl, fvti.Log2SEW)>;
-
- def : Pat<(fvti.Vector (int_riscv_vfmv_s_f (fvti.Vector fvti.RegClass:$rs1),
- (fvti.Scalar (fpimm0)), VLOpFrag)),
- (!cast<Instruction>("PseudoVMV_S_X_" # fvti.LMul.MX)
- (fvti.Vector $rs1), (XLenVT X0), GPR:$vl, fvti.Log2SEW)>;
-
- def : Pat<(fvti.Vector (int_riscv_vfmv_s_f (fvti.Vector fvti.RegClass:$rs1),
- (fvti.Scalar (SelectFPImm (XLenVT GPR:$imm))), VLOpFrag)),
- (!cast<Instruction>("PseudoVMV_S_X_" # fvti.LMul.MX)
- (fvti.Vector $rs1), GPR:$imm, GPR:$vl, fvti.Log2SEW)>;
- }
-}
-
//===----------------------------------------------------------------------===//
// 16.3. Vector Slide Instructions
//===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfmv.s.f.ll b/llvm/test/CodeGen/RISCV/rvv/vfmv.s.f.ll
index 4fdc7f2c774f5b..01d2eac99d0c07 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfmv.s.f.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfmv.s.f.ll
@@ -48,7 +48,7 @@ declare <vscale x 8 x half> @llvm.riscv.vfmv.s.f.nxv8f16(<vscale x 8 x half>, ha
define <vscale x 8 x half> @intrinsic_vfmv.s.f_f_nxv8f16(<vscale x 8 x half> %0, half %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv8f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -61,7 +61,7 @@ declare <vscale x 16 x half> @llvm.riscv.vfmv.s.f.nxv16f16(<vscale x 16 x half>,
define <vscale x 16 x half> @intrinsic_vfmv.s.f_f_nxv16f16(<vscale x 16 x half> %0, half %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv16f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -74,7 +74,7 @@ declare <vscale x 32 x half> @llvm.riscv.vfmv.s.f.nxv32f16(<vscale x 32 x half>,
define <vscale x 32 x half> @intrinsic_vfmv.s.f_f_nxv32f16(<vscale x 32 x half> %0, half %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv32f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -113,7 +113,7 @@ declare <vscale x 4 x float> @llvm.riscv.vfmv.s.f.nxv4f32(<vscale x 4 x float>,
define <vscale x 4 x float> @intrinsic_vfmv.s.f_f_nxv4f32(<vscale x 4 x float> %0, float %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv4f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -126,7 +126,7 @@ declare <vscale x 8 x float> @llvm.riscv.vfmv.s.f.nxv8f32(<vscale x 8 x float>,
define <vscale x 8 x float> @intrinsic_vfmv.s.f_f_nxv8f32(<vscale x 8 x float> %0, float %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv8f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -139,7 +139,7 @@ declare <vscale x 16 x float> @llvm.riscv.vfmv.s.f.nxv16f32(<vscale x 16 x float
define <vscale x 16 x float> @intrinsic_vfmv.s.f_f_nxv16f32(<vscale x 16 x float> %0, float %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv16f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -165,7 +165,7 @@ declare <vscale x 2 x double> @llvm.riscv.vfmv.s.f.nxv2f64(<vscale x 2 x double>
define <vscale x 2 x double> @intrinsic_vfmv.s.f_f_nxv2f64(<vscale x 2 x double> %0, double %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv2f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -178,7 +178,7 @@ declare <vscale x 4 x double> @llvm.riscv.vfmv.s.f.nxv4f64(<vscale x 4 x double>
define <vscale x 4 x double> @intrinsic_vfmv.s.f_f_nxv4f64(<vscale x 4 x double> %0, double %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv4f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -191,7 +191,7 @@ declare <vscale x 8 x double> @llvm.riscv.vfmv.s.f.nxv8f64(<vscale x 8 x double>
define <vscale x 8 x double> @intrinsic_vfmv.s.f_f_nxv8f64(<vscale x 8 x double> %0, double %1, iXLen %2) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_nxv8f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
entry:
@@ -235,7 +235,7 @@ entry:
define <vscale x 8 x half> @intrinsic_vfmv.s.f_f_zero_nxv8f16(<vscale x 8 x half> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv8f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -246,7 +246,7 @@ entry:
define <vscale x 16 x half> @intrinsic_vfmv.s.f_f_zero_nxv16f16(<vscale x 16 x half> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv16f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -257,7 +257,7 @@ entry:
define <vscale x 32 x half> @intrinsic_vfmv.s.f_f_zero_nxv32f16(<vscale x 32 x half> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv32f16:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e16, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -290,7 +290,7 @@ entry:
define <vscale x 4 x float> @intrinsic_vfmv.s.f_f_zero_nxv4f32(<vscale x 4 x float> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv4f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -301,7 +301,7 @@ entry:
define <vscale x 8 x float> @intrinsic_vfmv.s.f_f_zero_nxv8f32(<vscale x 8 x float> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv8f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -312,7 +312,7 @@ entry:
define <vscale x 16 x float> @intrinsic_vfmv.s.f_f_zero_nxv16f32(<vscale x 16 x float> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv16f32:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -334,7 +334,7 @@ entry:
define <vscale x 2 x double> @intrinsic_vfmv.s.f_f_zero_nxv2f64(<vscale x 2 x double> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv2f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m2, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -345,7 +345,7 @@ entry:
define <vscale x 4 x double> @intrinsic_vfmv.s.f_f_zero_nxv4f64(<vscale x 4 x double> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv4f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m4, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
@@ -356,7 +356,7 @@ entry:
define <vscale x 8 x double> @intrinsic_vfmv.s.f_f_zero_nxv8f64(<vscale x 8 x double> %0, iXLen %1) nounwind {
; CHECK-LABEL: intrinsic_vfmv.s.f_f_zero_nxv8f64:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vsetvli zero, a0, e64, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, zero
; CHECK-NEXT: ret
entry:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
index e15c5a3323cbe2..5ad741dbac10e4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -630,8 +630,9 @@ define void @add_v16i64(ptr %x, ptr %y) vscale_range(2,2) {
define <vscale x 2 x float> @fp_reduction_vfmv_s_f(float %0, <vscale x 8 x float> %1, i64 %2) {
; CHECK-LABEL: fp_reduction_vfmv_s_f:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, ta, ma
; CHECK-NEXT: vfmv.s.f v12, fa0
+; CHECK-NEXT: vsetvli zero, a0, e32, m4, ta, ma
; CHECK-NEXT: vfredusum.vs v8, v8, v12
; CHECK-NEXT: ret
%4 = tail call <vscale x 8 x float> @llvm.riscv.vfmv.s.f.nxv8f32.i64(<vscale x 8 x float> poison, float %0, i64 %2)
|
; CHECK-NEXT: vfmv.s.f v12, fa0 | ||
; CHECK-NEXT: vsetvli zero, a0, e32, m4, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regression. Do you have a plan to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @preames's draft patch here should fix it. Happy to take over it if there's no objections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs rebased after #76801, marking as request changes to reflect and make it easier to scan review list for actionable items.
Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions, and shows how the LMUL reducing combine for VFMV_S_F_VL and VMV_S_X_VL results in the discrepancy in the test added in llvm#75544. Split out from llvm#71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.
050496a
to
d453f0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions. Split out from llvm#71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.
Currently vfmv.s.f intrinsics are directly selected to their pseudos via a
tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move
instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their
corresponding VL SDNode, then get selected from a pattern in
RISCVInstrInfoVVLPatterns.td
This patch brings vfmv.s.f inline with the other move instructions, and shows
how the LMUL reducing combine for VFMV_S_F_VL and VMV_S_X_VL results in the
discrepancy in the test added in #75544.
Split out from #71501, where we did this to preserve the behaviour of selecting
vmv_s_x for VFMV_S_F_VL for small enough immediates.