[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. #110486

fhahn · 2024-09-30T10:43:40Z

This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.

llvmbot · 2024-09-30T10:44:15Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.

Patch is 28.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110486.diff

12 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-13)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+3-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+54-14)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+100-58)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+4-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/widen-call-with-intrinsic-or-libfunc.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+1-1)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+3-3)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 08e78cb49c69fc..46482ed9e98ae7 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4367,7 +4367,7 @@ void LoopVectorizationPlanner::emitInvalidCostRemarks(
                 [](const auto *R) { return Instruction::Store; })
             .Case<VPWidenLoadRecipe>(
                 [](const auto *R) { return Instruction::Load; })
-            .Case<VPWidenCallRecipe>(
+            .Case<VPWidenCallRecipe, VPWidenIntrinsicRecipe>(
                 [](const auto *R) { return Instruction::Call; })
             .Case<VPInstruction, VPWidenRecipe, VPReplicateRecipe,
                   VPWidenCastRecipe>(
@@ -4392,11 +4392,17 @@ void LoopVectorizationPlanner::emitInvalidCostRemarks(
       OS << "):";
       if (Opcode == Instruction::Call) {
         auto *WidenCall = dyn_cast<VPWidenCallRecipe>(R);
-        Function *CalledFn =
-            WidenCall ? WidenCall->getCalledScalarFunction()
-                      : cast<Function>(R->getOperand(R->getNumOperands() - 1)
-                                           ->getLiveInIRValue());
-        OS << " call to " << CalledFn->getName();
+        StringRef Name = "";
+        if (auto *Int = dyn_cast<VPWidenIntrinsicRecipe>(R))
+          Name = Int->getIntrinsicName();
+        else {
+          Function *CalledFn =
+              WidenCall ? WidenCall->getCalledScalarFunction()
+                        : cast<Function>(R->getOperand(R->getNumOperands() - 1)
+                                             ->getLiveInIRValue());
+          Name = CalledFn->getName();
+        }
+        OS << " call to " << Name;
       } else
         OS << " " << Instruction::getOpcodeName(Opcode);
       reportVectorizationInfo(OutString, "InvalidCost", ORE, OrigLoop, nullptr,
@@ -4447,6 +4453,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPWidenCanonicalIVSC:
       case VPDef::VPWidenCastSC:
       case VPDef::VPWidenGEPSC:
+      case VPDef::VPWidenIntrinsicSC:
       case VPDef::VPWidenSC:
       case VPDef::VPWidenSelectSC:
       case VPDef::VPBlendSC:
@@ -8266,7 +8273,7 @@ VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
   return new VPBlendRecipe(Phi, OperandsWithMask);
 }
 
-VPWidenCallRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
+VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
   bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
@@ -8297,8 +8304,9 @@ VPWidenCallRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                 },
                 Range);
   if (ShouldUseVectorIntrinsic)
-    return new VPWidenCallRecipe(CI, make_range(Ops.begin(), Ops.end()), ID,
-                                 CI->getDebugLoc());
+    return new VPWidenIntrinsicRecipe(CI, ID,
+                                      make_range(Ops.begin(), Ops.end() - 1),
+                                      CI->getType(), CI->getDebugLoc());
 
   Function *Variant = nullptr;
   std::optional<unsigned> MaskPos;
@@ -8350,9 +8358,8 @@ VPWidenCallRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       Ops.insert(Ops.begin() + *MaskPos, Mask);
     }
 
-    return new VPWidenCallRecipe(CI, make_range(Ops.begin(), Ops.end()),
-                                 Intrinsic::not_intrinsic, CI->getDebugLoc(),
-                                 Variant);
+    return new VPWidenCallRecipe(
+        CI, Variant, make_range(Ops.begin(), Ops.end()), CI->getDebugLoc());
   }
 
   return nullptr;
@@ -9218,7 +9225,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
             RecurrenceDescriptor::isFMulAddIntrinsic(CurrentLinkI) &&
             "Expected instruction to be a call to the llvm.fmuladd intrinsic");
         assert(((MinVF.isScalar() && isa<VPReplicateRecipe>(CurrentLink)) ||
-                isa<VPWidenCallRecipe>(CurrentLink)) &&
+                isa<VPWidenIntrinsicRecipe>(CurrentLink)) &&
                CurrentLink->getOperand(2) == PreviousLink &&
                "expected a call where the previous link is the added operand");
 
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 02ef6f7b6fb982..5d4a3b555981ce 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -93,9 +93,9 @@ class VPRecipeBuilder {
   VPBlendRecipe *tryToBlend(PHINode *Phi, ArrayRef<VPValue *> Operands);
 
   /// Handle call instructions. If \p CI can be widened for \p Range.Start,
-  /// return a new VPWidenCallRecipe. Range.End may be decreased to ensure same
-  /// decision from \p Range.Start to \p Range.End.
-  VPWidenCallRecipe *tryToWidenCall(CallInst *CI, ArrayRef<VPValue *> Operands,
+  /// return a new VPWidenCallRecipe or VPWidenIntrinsicRecipe. Range.End may be
+  /// decreased to ensure same decision from \p Range.Start to \p Range.End.
+  VPSingleDefRecipe *tryToWidenCall(CallInst *CI, ArrayRef<VPValue *> Operands,
                                     VFRange &Range);
 
   /// Check if \p I has an opcode that can be widened and return a VPWidenRecipe
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c4567362eaffc7..b7b2d000eb77ea 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -886,6 +886,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPWidenCanonicalIVSC:
     case VPRecipeBase::VPWidenCastSC:
     case VPRecipeBase::VPWidenGEPSC:
+    case VPRecipeBase::VPWidenIntrinsicSC:
     case VPRecipeBase::VPWidenSC:
     case VPRecipeBase::VPWidenEVLSC:
     case VPRecipeBase::VPWidenSelectSC:
@@ -1608,24 +1609,63 @@ class VPScalarCastRecipe : public VPSingleDefRecipe {
   }
 };
 
-/// A recipe for widening Call instructions.
-class VPWidenCallRecipe : public VPSingleDefRecipe {
-  /// ID of the vector intrinsic to call when widening the call. If set the
-  /// Intrinsic::not_intrinsic, a library call will be used instead.
+/// A recipe for widening vector intrinsics.
+class VPWidenIntrinsicRecipe : public VPSingleDefRecipe {
+  /// ID of the vector intrinsic to widen.
   Intrinsic::ID VectorIntrinsicID;
-  /// If this recipe represents a library call, Variant stores a pointer to
-  /// the chosen function. There is a 1:1 mapping between a given VF and the
-  /// chosen vectorized variant, so there will be a different vplan for each
-  /// VF with a valid variant.
+
+  /// Scalar type of the result produced by the intrinsic.
+  Type *ResultTy;
+
+public:
+  template <typename IterT>
+  VPWidenIntrinsicRecipe(Value *UV, Intrinsic::ID VectorIntrinsicID,
+                         iterator_range<IterT> CallArguments, Type *Ty,
+                         DebugLoc DL = {})
+      : VPSingleDefRecipe(VPDef::VPWidenIntrinsicSC, CallArguments, UV, DL),
+        VectorIntrinsicID(VectorIntrinsicID), ResultTy(Ty) {}
+
+  ~VPWidenIntrinsicRecipe() override = default;
+
+  VPWidenIntrinsicRecipe *clone() override {
+    return new VPWidenIntrinsicRecipe(getUnderlyingValue(), VectorIntrinsicID,
+                                      operands(), ResultTy, getDebugLoc());
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntrinsicSC)
+
+  /// Produce a widened version of the vector intrinsic.
+  void execute(VPTransformState &State) override;
+
+  /// Return the cost of this vector intrinsic.
+  InstructionCost computeCost(ElementCount VF,
+                              VPCostContext &Ctx) const override;
+
+  Type *getResultTy() const { return ResultTy; }
+
+  /// Return to name of the intrinsic as string.
+  StringRef getIntrinsicName() const;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
+/// A recipe for widening Call instructions using library calls.
+class VPWidenCallRecipe : public VPSingleDefRecipe {
+  /// Variant stores a pointer to the chosen function. There is a 1:1 mapping
+  /// between a given VF and the chosen vectorized variant, so there will be a
+  /// different VPlan for each VF with a valid variant.
   Function *Variant;
 
 public:
   template <typename IterT>
-  VPWidenCallRecipe(Value *UV, iterator_range<IterT> CallArguments,
-                    Intrinsic::ID VectorIntrinsicID, DebugLoc DL = {},
-                    Function *Variant = nullptr)
+  VPWidenCallRecipe(Value *UV, Function *Variant,
+                    iterator_range<IterT> CallArguments, DebugLoc DL = {})
       : VPSingleDefRecipe(VPDef::VPWidenCallSC, CallArguments, UV, DL),
-        VectorIntrinsicID(VectorIntrinsicID), Variant(Variant) {
+        Variant(Variant) {
     assert(
         isa<Function>(getOperand(getNumOperands() - 1)->getLiveInIRValue()) &&
         "last operand must be the called function");
@@ -1634,8 +1674,8 @@ class VPWidenCallRecipe : public VPSingleDefRecipe {
   ~VPWidenCallRecipe() override = default;
 
   VPWidenCallRecipe *clone() override {
-    return new VPWidenCallRecipe(getUnderlyingValue(), operands(),
-                                 VectorIntrinsicID, getDebugLoc(), Variant);
+    return new VPWidenCallRecipe(getUnderlyingValue(), Variant, operands(),
+                                 getDebugLoc());
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenCallSC)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 277df0637372d8..24f34e43a6e84b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -268,6 +268,8 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
                 VPReplicateRecipe, VPWidenCallRecipe, VPWidenMemoryRecipe,
                 VPWidenSelectRecipe>(
               [this](const auto *R) { return inferScalarTypeForRecipe(R); })
+          .Case<VPWidenIntrinsicRecipe>(
+              [](const VPWidenIntrinsicRecipe *R) { return R->getResultTy(); })
           .Case<VPInterleaveRecipe>([V](const VPInterleaveRecipe *R) {
             // TODO: Use info from interleave group.
             return V->getUnderlyingValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index f8b0a400a31d7d..faf6931e066556 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -79,6 +79,8 @@ bool VPRecipeBase::mayWriteToMemory() const {
     return !cast<VPWidenCallRecipe>(this)
                 ->getCalledScalarFunction()
                 ->onlyReadsMemory();
+  case VPWidenIntrinsicSC:
+    return false;
   case VPBranchOnMaskSC:
   case VPScalarIVStepsSC:
   case VPPredInstPHISC:
@@ -120,6 +122,8 @@ bool VPRecipeBase::mayReadFromMemory() const {
     return !cast<VPWidenCallRecipe>(this)
                 ->getCalledScalarFunction()
                 ->onlyWritesMemory();
+  case VPWidenIntrinsicSC:
+    return true;
   case VPBranchOnMaskSC:
   case VPPredInstPHISC:
   case VPScalarIVStepsSC:
@@ -161,6 +165,8 @@ bool VPRecipeBase::mayHaveSideEffects() const {
     Function *Fn = cast<VPWidenCallRecipe>(this)->getCalledScalarFunction();
     return mayWriteToMemory() || !Fn->doesNotThrow() || !Fn->willReturn();
   }
+  case VPWidenIntrinsicSC:
+    return false;
   case VPBlendSC:
   case VPReductionEVLSC:
   case VPReductionSC:
@@ -879,61 +885,31 @@ void VPIRInstruction::print(raw_ostream &O, const Twine &Indent,
 
 void VPWidenCallRecipe::execute(VPTransformState &State) {
   assert(State.VF.isVector() && "not widening");
-  Function *CalledScalarFn = getCalledScalarFunction();
-  assert(!isDbgInfoIntrinsic(CalledScalarFn->getIntrinsicID()) &&
-         "DbgInfoIntrinsic should have been dropped during VPlan construction");
   State.setDebugLocFrom(getDebugLoc());
 
-  bool UseIntrinsic = VectorIntrinsicID != Intrinsic::not_intrinsic;
-  FunctionType *VFTy = nullptr;
-  if (Variant)
-    VFTy = Variant->getFunctionType();
-  SmallVector<Type *, 2> TysForDecl;
+  FunctionType *VFTy = Variant->getFunctionType();
   // Add return type if intrinsic is overloaded on it.
-  if (UseIntrinsic &&
-      isVectorIntrinsicWithOverloadTypeAtArg(VectorIntrinsicID, -1))
-    TysForDecl.push_back(VectorType::get(
-        CalledScalarFn->getReturnType()->getScalarType(), State.VF));
   SmallVector<Value *, 4> Args;
   for (const auto &I : enumerate(arg_operands())) {
-    // Some intrinsics have a scalar argument - don't replace it with a
-    // vector.
     Value *Arg;
-    if (UseIntrinsic &&
-        isVectorIntrinsicWithScalarOpAtArg(VectorIntrinsicID, I.index()))
-      Arg = State.get(I.value(), VPLane(0));
     // Some vectorized function variants may also take a scalar argument,
     // e.g. linear parameters for pointers. This needs to be the scalar value
     // from the start of the respective part when interleaving.
-    else if (VFTy && !VFTy->getParamType(I.index())->isVectorTy())
+    if (!VFTy->getParamType(I.index())->isVectorTy())
       Arg = State.get(I.value(), VPLane(0));
     else
-      Arg = State.get(I.value());
-    if (UseIntrinsic &&
-        isVectorIntrinsicWithOverloadTypeAtArg(VectorIntrinsicID, I.index()))
-      TysForDecl.push_back(Arg->getType());
+      Arg = State.get(I.value(), onlyFirstLaneUsed(I.value()));
     Args.push_back(Arg);
   }
 
-  Function *VectorF;
-  if (UseIntrinsic) {
-    // Use vector version of the intrinsic.
-    Module *M = State.Builder.GetInsertBlock()->getModule();
-    VectorF = Intrinsic::getDeclaration(M, VectorIntrinsicID, TysForDecl);
-    assert(VectorF && "Can't retrieve vector intrinsic.");
-  } else {
-#ifndef NDEBUG
-    assert(Variant != nullptr && "Can't create vector function.");
-#endif
-    VectorF = Variant;
-  }
+  assert(Variant != nullptr && "Can't create vector function.");
 
-  auto *CI = cast_or_null<CallInst>(getUnderlyingInstr());
+  auto *CI = cast_or_null<CallInst>(getUnderlyingValue());
   SmallVector<OperandBundleDef, 1> OpBundles;
   if (CI)
     CI->getOperandBundlesAsDefs(OpBundles);
 
-  CallInst *V = State.Builder.CreateCall(VectorF, Args, OpBundles);
+  CallInst *V = State.Builder.CreateCall(Variant, Args, OpBundles);
 
   if (isa<FPMathOperator>(V))
     V->copyFastMathFlags(CI);
@@ -946,12 +922,84 @@ void VPWidenCallRecipe::execute(VPTransformState &State) {
 InstructionCost VPWidenCallRecipe::computeCost(ElementCount VF,
                                                VPCostContext &Ctx) const {
   TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-  if (Variant) {
-    return Ctx.TTI.getCallInstrCost(nullptr, Variant->getReturnType(),
-                                    Variant->getFunctionType()->params(),
-                                    CostKind);
+  return Ctx.TTI.getCallInstrCost(nullptr, Variant->getReturnType(),
+                                  Variant->getFunctionType()->params(),
+                                  CostKind);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenCallRecipe::print(raw_ostream &O, const Twine &Indent,
+                              VPSlotTracker &SlotTracker) const {
+  O << Indent << "WIDEN-CALL ";
+
+  Function *CalledFn = getCalledScalarFunction();
+  if (CalledFn->getReturnType()->isVoidTy())
+    O << "void ";
+  else {
+    printAsOperand(O, SlotTracker);
+    O << " = ";
+  }
+
+  O << "call @" << CalledFn->getName() << "(";
+  interleaveComma(arg_operands(), O, [&O, &SlotTracker](VPValue *Op) {
+    Op->printAsOperand(O, SlotTracker);
+  });
+  O << ")";
+
+  O << " (using library function";
+  if (Variant->hasName())
+    O << ": " << Variant->getName();
+  O << ")";
+}
+#endif
+
+void VPWidenIntrinsicRecipe::execute(VPTransformState &State) {
+  assert(State.VF.isVector() && "not widening");
+  State.setDebugLocFrom(getDebugLoc());
+
+  SmallVector<Type *, 2> TysForDecl;
+  // Add return type if intrinsic is overloaded on it.
+  if (isVectorIntrinsicWithOverloadTypeAtArg(VectorIntrinsicID, -1))
+    TysForDecl.push_back(VectorType::get(getResultTy(), State.VF));
+  SmallVector<Value *, 4> Args;
+  for (const auto &I : enumerate(operands())) {
+    // Some intrinsics have a scalar argument - don't replace it with a
+    // vector.
+    Value *Arg;
+    if (isVectorIntrinsicWithScalarOpAtArg(VectorIntrinsicID, I.index()))
+      Arg = State.get(I.value(), VPLane(0));
+    else
+      Arg = State.get(I.value(), onlyFirstLaneUsed(I.value()));
+    if (isVectorIntrinsicWithOverloadTypeAtArg(VectorIntrinsicID, I.index()))
+      TysForDecl.push_back(Arg->getType());
+    Args.push_back(Arg);
   }
 
+  // Use vector version of the intrinsic.
+  Module *M = State.Builder.GetInsertBlock()->getModule();
+  Function *VectorF =
+      Intrinsic::getDeclaration(M, VectorIntrinsicID, TysForDecl);
+  assert(VectorF && "Can't retrieve vector intrinsic.");
+
+  auto *CI = cast_or_null<CallInst>(getUnderlyingValue());
+  SmallVector<OperandBundleDef, 1> OpBundles;
+  if (CI)
+    CI->getOperandBundlesAsDefs(OpBundles);
+
+  CallInst *V = State.Builder.CreateCall(VectorF, Args, OpBundles);
+
+  if (isa<FPMathOperator>(V))
+    V->copyFastMathFlags(CI);
+
+  if (!V->getType()->isVoidTy())
+    State.set(this, V);
+  State.addMetadata(V, CI);
+}
+
+InstructionCost VPWidenIntrinsicRecipe::computeCost(ElementCount VF,
+                                                    VPCostContext &Ctx) const {
+  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+
   FastMathFlags FMF;
   // TODO: Manage flags via VPRecipeWithIRFlags.
   if (auto *FPMO = dyn_cast_or_null<FPMathOperator>(getUnderlyingValue()))
@@ -990,33 +1038,27 @@ InstructionCost VPWidenCallRecipe::computeCost(ElementCount VF,
   return Ctx.TTI.getIntrinsicInstrCost(CostAttrs, CostKind);
 }
 
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-void VPWidenCallRecipe::print(raw_ostream &O, const Twine &Indent,
-                              VPSlotTracker &SlotTracker) const {
-  O << Indent << "WIDEN-CALL ";
+StringRef VPWidenIntrinsicRecipe::getIntrinsicName() const {
+  return Intrinsic::getBaseName(VectorIntrinsicID);
+}
 
-  Function *CalledFn = getCalledScalarFunction();
-  if (CalledFn->getReturnType()->isVoidTy())
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntrinsicRecipe::print(raw_ostream &O, const Twine &Indent,
+                                   VPSlotTracker &SlotTracker) const {
+  O << Indent << "WIDEN-INTRINSIC ";
+  if (ResultTy->isVoidTy()) {
     O << "void ";
-  else {
+  } else {
     printAsOperand(O, SlotTracker);
     O << " = ";
   }
 
-  O << "call @" << CalledFn->getName() << "(";
-  interleaveComma(arg_operands(), O, [&O, &SlotTracker](VPValue *Op) {
+  O << getIntrinsicName() << "(";
+
+  interleaveComma(operands(), O, [&O, &SlotTracker](VPValue *Op) {
     Op->printAsOperand(O, SlotTracker);
   });
   O << ")";
-
-  if (VectorIntrinsicID)
-    O << " (using vector intrinsic)";
-  else {
-    O << " (using library function";
-    if (Variant->hasName())
-      O << ": " << Variant->getName();
-    O << ")";
-  }
 }
 #endif
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a878613c4ba483..8ae4291de9e9a7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -82,9 +82,10 @@ void VPlanTransforms::VPInstructionsToVPRecipes(
         } else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
           NewRecipe = new VPWidenGEPRecipe(GEP, Ingredient.operands());
         } else if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
-          NewRecipe = new VPWidenCallRecipe(
-              CI, Ingredient.operands(), getVectorIntrinsicIDForCall(CI, &TLI),
-              CI->getDebugLoc());
+          NewRecipe = new VPWidenIntrinsicRecipe(
+              CI, getVectorIntrinsicIDForCall(CI, &TLI),
+              make_range(Ingredient.op_begin(), Ingredient.op_end() - 1),
+              CI->getType(), CI->getDebugLoc());
         } else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {
           N...
[truncated]

Use VPWidenIntrinsicRecipe (llvm#110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. NOTE: this PR contains the changes from llvm#110486) The relevant changes are in 47135542e2ada3f21b215bf237d8442a56c8456c.

This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.

Use VPWidenIntrinsicRecipe (llvm#110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. NOTE: this PR contains the changes from llvm#110486) The relevant changes are in 47135542e2ada3f21b215bf237d8442a56c8456c.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

alexey-bataev · 2024-10-03T10:24:06Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -79,6 +79,8 @@ bool VPRecipeBase::mayWriteToMemory() const {
    return !cast<VPWidenCallRecipe>(this)
                ->getCalledScalarFunction()
                ->onlyReadsMemory();
+  case VPWidenIntrinsicSC:
+    return false;


This might be incorrect for memaccesses intrinsics. Need somehow to restrict the class for using with such intrinsics

Yep, updated to store read/write/sideeffcts in recipe.

alexey-bataev · 2024-10-03T10:24:18Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  case VPWidenIntrinsicSC:
+    return true;


Yep, updated to store read/write/sideeffcts in recipe.

alexey-bataev · 2024-10-03T10:24:37Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  case VPWidenIntrinsicSC:
+    return false;


Yep, updated to store read/write/sideeffcts in recipe.

alexey-bataev · 2024-10-04T16:56:33Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    return new VPWidenCallRecipe(CI, make_range(Ops.begin(), Ops.end()), ID,
-                                 CI->getDebugLoc());
+    return new VPWidenIntrinsicRecipe(*CI, ID,
+                                      make_range(Ops.begin(), Ops.end() - 1),


Maybe move Ops.push_back(Operands.back()); from line 8305 to line 8319 to avoid -1 here?

Updated, thanks!

alexey-bataev · 2024-10-04T16:57:17Z

llvm/lib/Transforms/Vectorize/VPlan.h

+public:
+  template <typename IterT>
+  VPWidenIntrinsicRecipe(CallInst &CI, Intrinsic::ID VectorIntrinsicID,
+                         iterator_range<IterT> CallArguments, Type *Ty,


What about ArrayRef here instead of iterator_range?

Updated, thanks!

Mel-Chen · 2024-10-07T10:24:04Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -79,6 +79,9 @@ bool VPRecipeBase::mayWriteToMemory() const {
    return !cast<VPWidenCallRecipe>(this)
                ->getCalledScalarFunction()
                ->onlyReadsMemory();
+  case VPWidenIntrinsicSC:
+    return cast<VPWidenIntrinsicRecipe>(this)->mayWriteToMemory();
+    ;


Redundant semicolon

Removed, thanks!

Mel-Chen · 2024-10-07T10:28:52Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  CallInst *V = State.Builder.CreateCall(Variant, Args, OpBundles);
+
+  if (isa<FPMathOperator>(V))
+    V->copyFastMathFlags(CI);


Is it possible for CI to be nullptr?

Left over from an older version after a rebase, should be setFlags(V) as on current main, fixed, thanks!

alexey-bataev · 2024-10-07T17:19:17Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    return new VPWidenCallRecipe(
+        CI, Variant, make_range(Ops.begin(), Ops.end()), CI->getDebugLoc());


Suggested change

return new VPWidenCallRecipe(

CI, Variant, make_range(Ops.begin(), Ops.end()), CI->getDebugLoc());

return new VPWidenCallRecipe(

CI, Variant, Ops, CI->getDebugLoc());

Done thanks!

alexey-bataev

LG

Use VPWidenIntrinsicRecipe (#110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: #110489

Use VPWidenIntrinsicRecipe (llvm#110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: llvm#110489

fhahn requested review from rengolin, alexey-bataev, ayalz and aniragil September 30, 2024 10:43

llvmbot added vectorizers llvm:transforms labels Sep 30, 2024

This was referenced Sep 30, 2024

[VPlan] Use VPWidenIntrinsicRecipe to vp.select. #110489

Merged

[VP][EVL] Support select instruction with EVL-vectorization #109614

Closed

fhahn force-pushed the vplan-widen-intrinsic branch from f9c9f33 to bba6b71 Compare October 1, 2024 13:37

fhahn force-pushed the vplan-widen-intrinsic branch from bba6b71 to 8ae59c7 Compare October 1, 2024 13:58

arcbbb reviewed Oct 3, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

alexey-bataev reviewed Oct 3, 2024

View reviewed changes

fhahn mentioned this pull request Oct 3, 2024

[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. #101641

Merged

fhahn added 2 commits October 4, 2024 13:01

Merge remote-tracking branch 'origin/main' into vplan-widen-intrinsic

74fc8bb

!fixup store read/write/sideeffects property in recipe.

ea64bc7

alexey-bataev reviewed Oct 4, 2024

View reviewed changes

Mel-Chen reviewed Oct 7, 2024

View reviewed changes

fhahn added 2 commits October 7, 2024 14:39

Merge remote-tracking branch 'origin/main' into vplan-widen-intrinsic

ebbfa49

!fixup address latest comments, thanks!

ba896db

alexey-bataev reviewed Oct 7, 2024

View reviewed changes

fhahn added 2 commits October 8, 2024 12:25

Merge remote-tracking branch 'origin/main' into vplan-widen-intrinsic

2ccbfb8

!fixup update VPWidenCallRecipe ctor to use ArrayRef.

4243f26

alexey-bataev approved these changes Oct 8, 2024

View reviewed changes

!fixup refine names, add some more comments

03424f9

fhahn merged commit 6fbbe15 into llvm:main Oct 8, 2024
7 of 8 checks passed

fhahn deleted the vplan-widen-intrinsic branch October 8, 2024 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. #110486

[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. #110486

fhahn commented Sep 30, 2024

llvmbot commented Sep 30, 2024

alexey-bataev Oct 3, 2024

fhahn Oct 4, 2024

alexey-bataev Oct 3, 2024

fhahn Oct 4, 2024

alexey-bataev Oct 3, 2024

fhahn Oct 4, 2024

alexey-bataev Oct 4, 2024

fhahn Oct 7, 2024

alexey-bataev Oct 4, 2024

fhahn Oct 7, 2024

Mel-Chen Oct 7, 2024

fhahn Oct 7, 2024

Mel-Chen Oct 7, 2024

fhahn Oct 7, 2024

alexey-bataev Oct 7, 2024

fhahn Oct 8, 2024

alexey-bataev left a comment

		return new VPWidenCallRecipe(
		CI, Variant, make_range(Ops.begin(), Ops.end()), CI->getDebugLoc());

[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. #110486

[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. #110486

Conversation

fhahn commented Sep 30, 2024

llvmbot commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexey-bataev left a comment

Choose a reason for hiding this comment