Skip to content

[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin #113770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Oct 26, 2024

This is a follow up to #113291 and handles f16/bf16 with zvfhmin and zvfbmin.

This is a follow up to llvm#113291 and handles f16/bf16 with zvfhmin and zvfbmin.
@llvmbot
Copy link
Member

llvmbot commented Oct 26, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is a follow up to #113291 and handles f16/bf16 with zvfhmin and zvfbmin.


Full diff: https://github.com/llvm/llvm-project/pull/113770.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-compress-fp.ll (+88-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-compress.ll (+132-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 69112d868bff82..af7a39b2580a37 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1081,7 +1081,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
       setOperationAction({ISD::INSERT_VECTOR_ELT, ISD::CONCAT_VECTORS,
                           ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR,
                           ISD::VECTOR_DEINTERLEAVE, ISD::VECTOR_INTERLEAVE,
-                          ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE},
+                          ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE,
+                          ISD::VECTOR_COMPRESS},
                          VT, Custom);
       MVT EltVT = VT.getVectorElementType();
       if (isTypeLegal(EltVT))
@@ -1333,7 +1334,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
         setOperationAction(ISD::UNDEF, VT, Custom);
 
         setOperationAction({ISD::CONCAT_VECTORS, ISD::VECTOR_REVERSE,
-                            ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},
+                            ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR,
+                            ISD::VECTOR_COMPRESS},
                            VT, Custom);
 
         // FIXME: mload, mstore, mgather, mscatter, vp_load/store,
@@ -1440,8 +1442,6 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
              ISD::STRICT_FCEIL, ISD::STRICT_FFLOOR, ISD::STRICT_FROUND,
              ISD::STRICT_FROUNDEVEN, ISD::STRICT_FNEARBYINT},
             VT, Custom);
-
-        setOperationAction(ISD::VECTOR_COMPRESS, VT, Custom);
       }
 
       // Custom-legalize bitcasts from fixed-length vectors to scalar types.
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-compress-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-compress-fp.ll
index 8f1ff7ed4a11e2..3069d601107570 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-compress-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-compress-fp.ll
@@ -1,6 +1,92 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zfh,+zvfh -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zfh,+zvfh -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zvfh,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zvfh,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zvfhmin,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zvfhmin,+zvfbfmin < %s | FileCheck %s
+
+define <1 x bfloat> @vector_compress_v1bf16(<1 x bfloat> %v, <1 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v1bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <1 x bfloat> @llvm.experimental.vector.compress.v1bf16(<1 x bfloat> %v, <1 x i1> %mask, <1 x bfloat> undef)
+  ret <1 x bfloat> %ret
+}
+
+define <1 x bfloat> @vector_compress_v1bf16_passthru(<1 x bfloat> %passthru, <1 x bfloat> %v, <1 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v1bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <1 x bfloat> @llvm.experimental.vector.compress.v1bf16(<1 x bfloat> %v, <1 x i1> %mask, <1 x bfloat> %passthru)
+  ret <1 x bfloat> %ret
+}
+
+define <2 x bfloat> @vector_compress_v2bf16(<2 x bfloat> %v, <2 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v2bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <2 x bfloat> @llvm.experimental.vector.compress.v2bf16(<2 x bfloat> %v, <2 x i1> %mask, <2 x bfloat> undef)
+  ret <2 x bfloat> %ret
+}
+
+define <2 x bfloat> @vector_compress_v2bf16_passthru(<2 x bfloat> %passthru, <2 x bfloat> %v, <2 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v2bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <2 x bfloat> @llvm.experimental.vector.compress.v2bf16(<2 x bfloat> %v, <2 x i1> %mask, <2 x bfloat> %passthru)
+  ret <2 x bfloat> %ret
+}
+
+define <4 x bfloat> @vector_compress_v4bf16(<4 x bfloat> %v, <4 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <4 x bfloat> @llvm.experimental.vector.compress.v4bf16(<4 x bfloat> %v, <4 x i1> %mask, <4 x bfloat> undef)
+  ret <4 x bfloat> %ret
+}
+
+define <4 x bfloat> @vector_compress_v4bf16_passthru(<4 x bfloat> %passthru, <4 x bfloat> %v, <4 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v4bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <4 x bfloat> @llvm.experimental.vector.compress.v4bf16(<4 x bfloat> %v, <4 x i1> %mask, <4 x bfloat> %passthru)
+  ret <4 x bfloat> %ret
+}
+
+define <8 x bfloat> @vector_compress_v8bf16(<8 x bfloat> %v, <8 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <8 x bfloat> @llvm.experimental.vector.compress.v8bf16(<8 x bfloat> %v, <8 x i1> %mask, <8 x bfloat> undef)
+  ret <8 x bfloat> %ret
+}
+
+define <8 x bfloat> @vector_compress_v8bf16_passthru(<8 x bfloat> %passthru, <8 x bfloat> %v, <8 x i1> %mask) {
+; CHECK-LABEL: vector_compress_v8bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <8 x bfloat> @llvm.experimental.vector.compress.v8bf16(<8 x bfloat> %v, <8 x i1> %mask, <8 x bfloat> %passthru)
+  ret <8 x bfloat> %ret
+}
 
 define <1 x half> @vector_compress_v1f16(<1 x half> %v, <1 x i1> %mask) {
 ; CHECK-LABEL: vector_compress_v1f16:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-compress.ll b/llvm/test/CodeGen/RISCV/rvv/vector-compress.ll
index 85d72ad2fe9cb4..7516a72a92bc8b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-compress.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-compress.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -verify-machineinstrs -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zfh,+zvfh %s -o - | FileCheck %s
-; RUN: llc -verify-machineinstrs -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zfh,+zvfh %s -o - | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zvfh,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zvfh,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv32 -mattr=+m,+v,+f,+d,+zvfhmin,+zvfbfmin < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=riscv64 -mattr=+m,+v,+f,+d,+zvfhmin,+zvfbfmin < %s | FileCheck %s
 
 ; Vector compress for i8 type
 
@@ -472,6 +474,134 @@ define <vscale x 8 x i64> @vector_compress_nxv8i64_passthru(<vscale x 8 x i64> %
   ret <vscale x 8 x i64> %ret
 }
 
+; Vector compress for bf16 type
+
+define <vscale x 1 x bfloat> @vector_compress_nxv1bf16(<vscale x 1 x bfloat> %data, <vscale x 1 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv1bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 1 x bfloat> @llvm.experimental.vector.compress.nxv1bf16(<vscale x 1 x bfloat> %data, <vscale x 1 x i1> %mask, <vscale x 1 x bfloat> undef)
+  ret <vscale x 1 x bfloat> %ret
+}
+
+define <vscale x 1 x bfloat> @vector_compress_nxv1bf16_passthru(<vscale x 1 x bfloat> %passthru, <vscale x 1 x bfloat> %data, <vscale x 1 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv1bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 1 x bfloat> @llvm.experimental.vector.compress.nxv1bf16(<vscale x 1 x bfloat> %data, <vscale x 1 x i1> %mask, <vscale x 1 x bfloat> %passthru)
+  ret <vscale x 1 x bfloat> %ret
+}
+
+define <vscale x 2 x bfloat> @vector_compress_nxv2bf16(<vscale x 2 x bfloat> %data, <vscale x 2 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv2bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 2 x bfloat> @llvm.experimental.vector.compress.nxv2bf16(<vscale x 2 x bfloat> %data, <vscale x 2 x i1> %mask, <vscale x 2 x bfloat> undef)
+  ret <vscale x 2 x bfloat> %ret
+}
+
+define <vscale x 2 x bfloat> @vector_compress_nxv2bf16_passthru(<vscale x 2 x bfloat> %passthru, <vscale x 2 x bfloat> %data, <vscale x 2 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv2bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 2 x bfloat> @llvm.experimental.vector.compress.nxv2bf16(<vscale x 2 x bfloat> %data, <vscale x 2 x i1> %mask, <vscale x 2 x bfloat> %passthru)
+  ret <vscale x 2 x bfloat> %ret
+}
+
+define <vscale x 4 x bfloat> @vector_compress_nxv4bf16(<vscale x 4 x bfloat> %data, <vscale x 4 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 4 x bfloat> @llvm.experimental.vector.compress.nxv4bf16(<vscale x 4 x bfloat> %data, <vscale x 4 x i1> %mask, <vscale x 4 x bfloat> undef)
+  ret <vscale x 4 x bfloat> %ret
+}
+
+define <vscale x 4 x bfloat> @vector_compress_nxv4bf16_passthru(<vscale x 4 x bfloat> %passthru, <vscale x 4 x bfloat> %data, <vscale x 4 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv4bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v9, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 4 x bfloat> @llvm.experimental.vector.compress.nxv4bf16(<vscale x 4 x bfloat> %data, <vscale x 4 x i1> %mask, <vscale x 4 x bfloat> %passthru)
+  ret <vscale x 4 x bfloat> %ret
+}
+
+define <vscale x 8 x bfloat> @vector_compress_nxv8bf16(<vscale x 8 x bfloat> %data, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 8 x bfloat> @llvm.experimental.vector.compress.nxv8bf16(<vscale x 8 x bfloat> %data, <vscale x 8 x i1> %mask, <vscale x 8 x bfloat> undef)
+  ret <vscale x 8 x bfloat> %ret
+}
+
+define <vscale x 8 x bfloat> @vector_compress_nxv8bf16_passthru(<vscale x 8 x bfloat> %passthru, <vscale x 8 x bfloat> %data, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv8bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v10, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 8 x bfloat> @llvm.experimental.vector.compress.nxv8bf16(<vscale x 8 x bfloat> %data, <vscale x 8 x i1> %mask, <vscale x 8 x bfloat> %passthru)
+  ret <vscale x 8 x bfloat> %ret
+}
+
+define <vscale x 16 x bfloat> @vector_compress_nxv16bf16(<vscale x 16 x bfloat> %data, <vscale x 16 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv16bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 16 x bfloat> @llvm.experimental.vector.compress.nxv16bf16(<vscale x 16 x bfloat> %data, <vscale x 16 x i1> %mask, <vscale x 16 x bfloat> undef)
+  ret <vscale x 16 x bfloat> %ret
+}
+
+define <vscale x 16 x bfloat> @vector_compress_nxv16bf16_passthru(<vscale x 16 x bfloat> %passthru, <vscale x 16 x bfloat> %data, <vscale x 16 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv16bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v12, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 16 x bfloat> @llvm.experimental.vector.compress.nxv16bf16(<vscale x 16 x bfloat> %data, <vscale x 16 x i1> %mask, <vscale x 16 x bfloat> %passthru)
+  ret <vscale x 16 x bfloat> %ret
+}
+
+define <vscale x 32 x bfloat> @vector_compress_nxv32bf16(<vscale x 32 x bfloat> %data, <vscale x 32 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv32bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 32 x bfloat> @llvm.experimental.vector.compress.nxv32bf16(<vscale x 32 x bfloat> %data, <vscale x 32 x i1> %mask, <vscale x 32 x bfloat> undef)
+  ret <vscale x 32 x bfloat> %ret
+}
+
+define <vscale x 32 x bfloat> @vector_compress_nxv32bf16_passthru(<vscale x 32 x bfloat> %passthru, <vscale x 32 x bfloat> %data, <vscale x 32 x i1> %mask) {
+; CHECK-LABEL: vector_compress_nxv32bf16_passthru:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, tu, ma
+; CHECK-NEXT:    vcompress.vm v8, v16, v0
+; CHECK-NEXT:    ret
+  %ret = call <vscale x 32 x bfloat> @llvm.experimental.vector.compress.nxv32bf16(<vscale x 32 x bfloat> %data, <vscale x 32 x i1> %mask, <vscale x 32 x bfloat> %passthru)
+  ret <vscale x 32 x bfloat> %ret
+}
+
 ; Vector compress for f16 type
 
 define <vscale x 1 x half> @vector_compress_nxv1f16(<vscale x 1 x half> %data, <vscale x 1 x i1> %mask) {

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lukel97 lukel97 merged commit 96f5c68 into llvm:main Oct 28, 2024
8 of 10 checks passed
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…llvm#113770)

This is a follow up to llvm#113291 and handles f16/bf16 with zvfhmin and
zvfbmin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants