[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. #87265

skc7 · 2024-04-01T17:07:18Z

This PR introduces new pass "amdgpu-sw-lower-lds".

This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory.
The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors.
This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag.

For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels).

Replacement of Kernel LDS accesses:

All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS".
A new "llvm.amdgcn..dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation.
A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass.
SW LDS:
It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.".
SW LDS Metadata:
It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds..md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.
At the epilogue of kernel, allocated memory would be made free by the same single work-item.

Replacement of non-kernel LDS accesses:

Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted.
This information is used to build two tables:
Base table:
Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel.
Offset table:
Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel.
A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.

llvmbot · 2024-04-01T17:07:50Z

@llvm/pr-subscribers-backend-amdgpu

Author: Chaitanya (skc7)

Changes

This PR introduces new pass "amdgpu-sw-lower-lds". It lowers the local data store, LDS, uses in kernel and non-kernel functions in module with dynamically allocated device global memory.

Replacement of Kernel LDS accesses:

For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). A device global memory equal to size of all these LDS globals will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in new LDS global that will be created for the kernel. This will be called "malloc LDS global" in this pass. Each LDS access corresponds to an offset in the allocated memory. All static LDS accesses will be allocated first and then dynamic LDS will occupy the device global memory. To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "metadata global" in this pass.
Malloc LDS Global:
It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.{kernel-name}".
Metadata Global:
It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced.
It will have name "llvm.amdgcn.sw.lds.{kernel-name}.md".
This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.
LDS accesses within the kernel will be replaced by "gep" ptr to corresponding offset into allocated device global memory for the kernel.
At the epilogue of kernel, allocated memory would be made free by the same single work-item.

Replacement of non-kernel LDS accesses:

Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. This information is used to build two globals which are used as tables for query of LDS replacement:
Base table:
Base table will have single row, with elements of the row placed as per kernel ID.
Each element in the row corresponds to addresss of "malloc LDS global" variable created for that kernel.
Offset table:
Offset table will have multiple rows and columns.
Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the address of the replacement of LDS global done by that particular kernel. A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, address of "malloc LDS global" for that corresponding kernel is obtained from base table. The Offset into the base "malloc LDS global" is obtained from corresponding element in offset table. With this information, replacement value is obtained.
Other changes:
"amdgpu-sw-lower-lds" pass is needed for address sanitizer instrumentation of LDS. So, this pass is enabled only if asan is enabled.
There are certain utility functions which can be reused from lower-module-lds pass. These functions are moved to AMDGPUMemoryUtils module for re-use in this new pass. lower-module-lds pass will be disabled if asan feature is enabled.

Patch is 122.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87265.diff

18 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+9)
(modified) llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp (+1-185)
(modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1)
(added) llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp (+865)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+6)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp (+176)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.h (+24)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access.ll (+99)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-lds-test.ll (+57)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access.ll (+192)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return.ll (+79)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-indirect-access.ll (+101)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-lds-test.ll (+88)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-function-param.ll (+61)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested.ll (+212)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access.ll (+84)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test.ll (+58)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 6016bd5187d887..15ff74f7c53af3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -263,6 +263,15 @@ struct AMDGPUAlwaysInlinePass : PassInfoMixin<AMDGPUAlwaysInlinePass> {
   bool GlobalOpt;
 };
 
+void initializeAMDGPUSwLowerLDSLegacyPass(PassRegistry &);
+extern char &AMDGPUSwLowerLDSLegacyPassID;
+ModulePass *createAMDGPUSwLowerLDSLegacyPass();
+
+struct AMDGPUSwLowerLDSPass : PassInfoMixin<AMDGPUSwLowerLDSPass> {
+  AMDGPUSwLowerLDSPass() {}
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
 class AMDGPUCodeGenPreparePass
     : public PassInfoMixin<AMDGPUCodeGenPreparePass> {
 private:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
index 595f09664c55e4..f0456d3f62a816 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
@@ -212,6 +212,7 @@
 #define DEBUG_TYPE "amdgpu-lower-module-lds"
 
 using namespace llvm;
+using namespace AMDGPU;
 
 namespace {
 
@@ -234,17 +235,6 @@ cl::opt<LoweringKind> LoweringKindLoc(
         clEnumValN(LoweringKind::hybrid, "hybrid",
                    "Lower via mixture of above strategies")));
 
-bool isKernelLDS(const Function *F) {
-  // Some weirdness here. AMDGPU::isKernelCC does not call into
-  // AMDGPU::isKernel with the calling conv, it instead calls into
-  // isModuleEntryFunction which returns true for more calling conventions
-  // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
-  // There's also a test that checks that the LDS lowering does not hit on
-  // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
-  // Putting LDS in the name of the function to draw attention to this.
-  return AMDGPU::isKernel(F->getCallingConv());
-}
-
 template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
   llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
     return L->getName() < R->getName();
@@ -305,183 +295,9 @@ class AMDGPULowerModuleLDS {
         Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
   }
 
-  static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
-    // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
-    // global may have uses from multiple different functions as a result.
-    // This pass specialises LDS variables with respect to the kernel that
-    // allocates them.
-
-    // This is semantically equivalent to (the unimplemented as slow):
-    // for (auto &F : M.functions())
-    //   for (auto &BB : F)
-    //     for (auto &I : BB)
-    //       for (Use &Op : I.operands())
-    //         if (constantExprUsesLDS(Op))
-    //           replaceConstantExprInFunction(I, Op);
-
-    SmallVector<Constant *> LDSGlobals;
-    for (auto &GV : M.globals())
-      if (AMDGPU::isLDSVariableToLower(GV))
-        LDSGlobals.push_back(&GV);
-
-    return convertUsersOfConstantsToInstructions(LDSGlobals);
-  }
-
 public:
   AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
 
-  using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
-
-  using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
-
-  static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
-                                     FunctionVariableMap &kernels,
-                                     FunctionVariableMap &functions) {
-
-    // Get uses from the current function, excluding uses by called functions
-    // Two output variables to avoid walking the globals list twice
-    for (auto &GV : M.globals()) {
-      if (!AMDGPU::isLDSVariableToLower(GV)) {
-        continue;
-      }
-
-      for (User *V : GV.users()) {
-        if (auto *I = dyn_cast<Instruction>(V)) {
-          Function *F = I->getFunction();
-          if (isKernelLDS(F)) {
-            kernels[F].insert(&GV);
-          } else {
-            functions[F].insert(&GV);
-          }
-        }
-      }
-    }
-  }
-
-  struct LDSUsesInfoTy {
-    FunctionVariableMap direct_access;
-    FunctionVariableMap indirect_access;
-  };
-
-  static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
-
-    FunctionVariableMap direct_map_kernel;
-    FunctionVariableMap direct_map_function;
-    getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
-
-    // Collect variables that are used by functions whose address has escaped
-    DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
-    for (Function &F : M.functions()) {
-      if (!isKernelLDS(&F))
-        if (F.hasAddressTaken(nullptr,
-                              /* IgnoreCallbackUses */ false,
-                              /* IgnoreAssumeLikeCalls */ false,
-                              /* IgnoreLLVMUsed */ true,
-                              /* IgnoreArcAttachedCall */ false)) {
-          set_union(VariablesReachableThroughFunctionPointer,
-                    direct_map_function[&F]);
-        }
-    }
-
-    auto functionMakesUnknownCall = [&](const Function *F) -> bool {
-      assert(!F->isDeclaration());
-      for (const CallGraphNode::CallRecord &R : *CG[F]) {
-        if (!R.second->getFunction()) {
-          return true;
-        }
-      }
-      return false;
-    };
-
-    // Work out which variables are reachable through function calls
-    FunctionVariableMap transitive_map_function = direct_map_function;
-
-    // If the function makes any unknown call, assume the worst case that it can
-    // access all variables accessed by functions whose address escaped
-    for (Function &F : M.functions()) {
-      if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
-        if (!isKernelLDS(&F)) {
-          set_union(transitive_map_function[&F],
-                    VariablesReachableThroughFunctionPointer);
-        }
-      }
-    }
-
-    // Direct implementation of collecting all variables reachable from each
-    // function
-    for (Function &Func : M.functions()) {
-      if (Func.isDeclaration() || isKernelLDS(&Func))
-        continue;
-
-      DenseSet<Function *> seen; // catches cycles
-      SmallVector<Function *, 4> wip{&Func};
-
-      while (!wip.empty()) {
-        Function *F = wip.pop_back_val();
-
-        // Can accelerate this by referring to transitive map for functions that
-        // have already been computed, with more care than this
-        set_union(transitive_map_function[&Func], direct_map_function[F]);
-
-        for (const CallGraphNode::CallRecord &R : *CG[F]) {
-          Function *ith = R.second->getFunction();
-          if (ith) {
-            if (!seen.contains(ith)) {
-              seen.insert(ith);
-              wip.push_back(ith);
-            }
-          }
-        }
-      }
-    }
-
-    // direct_map_kernel lists which variables are used by the kernel
-    // find the variables which are used through a function call
-    FunctionVariableMap indirect_map_kernel;
-
-    for (Function &Func : M.functions()) {
-      if (Func.isDeclaration() || !isKernelLDS(&Func))
-        continue;
-
-      for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
-        Function *ith = R.second->getFunction();
-        if (ith) {
-          set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
-        } else {
-          set_union(indirect_map_kernel[&Func],
-                    VariablesReachableThroughFunctionPointer);
-        }
-      }
-    }
-
-    // Verify that we fall into one of 2 cases:
-    //    - All variables are absolute: this is a re-run of the pass
-    //      so we don't have anything to do.
-    //    - No variables are absolute.
-    std::optional<bool> HasAbsoluteGVs;
-    for (auto &Map : {direct_map_kernel, indirect_map_kernel}) {
-      for (auto &[Fn, GVs] : Map) {
-        for (auto *GV : GVs) {
-          bool IsAbsolute = GV->isAbsoluteSymbolRef();
-          if (HasAbsoluteGVs.has_value()) {
-            if (*HasAbsoluteGVs != IsAbsolute) {
-              report_fatal_error(
-                  "Module cannot mix absolute and non-absolute LDS GVs");
-            }
-          } else
-            HasAbsoluteGVs = IsAbsolute;
-        }
-      }
-    }
-
-    // If we only had absolute GVs, we have nothing to do, return an empty
-    // result.
-    if (HasAbsoluteGVs && *HasAbsoluteGVs)
-      return {FunctionVariableMap(), FunctionVariableMap()};
-
-    return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
-  }
-
   struct LDSVariableReplacement {
     GlobalVariable *SGV = nullptr;
     DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 90f36fadf35903..eda4949d0296d5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -22,6 +22,7 @@ MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
             AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
 MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this))
+MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass())
 MODULE_PASS("amdgpu-printf-runtime-binding", AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
 #undef MODULE_PASS
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
new file mode 100644
index 00000000000000..ed3670fa1386d6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
@@ -0,0 +1,865 @@
+//===-- AMDGPUSwLowerLDS.cpp -----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass lowers the local data store, LDS, uses in kernel and non-kernel
+// functions in module with dynamically allocated device global memory.
+//
+// Replacement of Kernel LDS accesses:
+//    For a kernel, LDS access can be static or dynamic which are direct
+//    (accessed within kernel) and indirect (accessed through non-kernels).
+//    A device global memory equal to size of all these LDS globals will be
+//    allocated. At the prologue of the kernel, a single work-item from the
+//    work-group, does a "malloc" and stores the pointer of the allocation in
+//    new LDS global that will be created for the kernel. This will be called
+//    "malloc LDS global" in this pass.
+//    Each LDS access corresponds to an offset in the allocated memory.
+//    All static LDS accesses will be allocated first and then dynamic LDS
+//    will occupy the device global memoery.
+//    To store the offsets corresponding to all LDS accesses, another global
+//    variable is created which will be called "metadata global" in this pass.
+//    - Malloc LDS Global:
+//        It is LDS global of ptr type with name
+//        "llvm.amdgcn.sw.lds.<kernel-name>".
+//    - Metadata Global:
+//        It is of struct type, with n members. n equals the number of LDS
+//        globals accessed by the kernel(direct and indirect). Each member of
+//        struct is another struct of type {i32, i32}. First member corresponds
+//        to offset, second member corresponds to size of LDS global being
+//        replaced. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md".
+//        This global will have an intializer with static LDS related offsets
+//        and sizes initialized. But for dynamic LDS related entries, offsets
+//        will be intialized to previous static LDS allocation end offset. Sizes
+//        for them will be zero initially. These dynamic LDS offset and size
+//        values will be updated with in the kernel, since kernel can read the
+//        dynamic LDS size allocation done at runtime with query to
+//        "hidden_dynamic_lds_size" hidden kernel argument.
+//
+//    LDS accesses within the kernel will be replaced by "gep" ptr to
+//    corresponding offset into allocated device global memory for the kernel.
+//    At the epilogue of kernel, allocated memory would be made free by the same
+//    single work-item.
+//
+// Replacement of non-kernel LDS accesses:
+//    Multiple kernels can access the same non-kernel function.
+//    All the kernels accessing LDS through non-kernels are sorted and
+//    assigned a kernel-id. All the LDS globals accessed by non-kernels
+//    are sorted. This information is used to build two tables:
+//    - Base table:
+//        Base table will have single row, with elements of the row
+//        placed as per kernel ID. Each element in the row corresponds
+//        to addresss of "malloc LDS global" variable created for
+//        that kernel.
+//    - Offset table:
+//        Offset table will have multiple rows and columns.
+//        Rows are assumed to be from 0 to (n-1). n is total number
+//        of kernels accessing the LDS through non-kernels.
+//        Each row will have m elements. m is the total number of
+//        unique LDS globals accessed by all non-kernels.
+//        Each element in the row correspond to the address of
+//        the replacement of LDS global done by that particular kernel.
+//    A LDS variable in non-kernel will be replaced based on the information
+//    from base and offset tables. Based on kernel-id query, address of "malloc
+//    LDS global" for that corresponding kernel is obtained from base table.
+//    The Offset into the base "malloc LDS global" is obtained from
+//    corresponding element in offset table. With this information, replacement
+//    value is obtained.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "Utils/AMDGPUMemoryUtils.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SetOperations.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/CallGraph.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/ReplaceConstant.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#include <algorithm>
+
+#define DEBUG_TYPE "amdgpu-sw-lower-lds"
+
+using namespace llvm;
+using namespace AMDGPU;
+
+namespace {
+
+using DomTreeCallback = function_ref<DominatorTree *(Function &F)>;
+
+struct LDSAccessTypeInfo {
+  SetVector<GlobalVariable *> StaticLDSGlobals;
+  SetVector<GlobalVariable *> DynamicLDSGlobals;
+};
+
+// Struct to hold all the Metadata required for a kernel
+// to replace a LDS global uses with corresponding offset
+// in to device global memory.
+struct KernelLDSParameters {
+  GlobalVariable *MallocLDSGlobal{nullptr};
+  GlobalVariable *MallocMetadataGlobal{nullptr};
+  LDSAccessTypeInfo DirectAccess;
+  LDSAccessTypeInfo IndirectAccess;
+  DenseMap<GlobalVariable *, SmallVector<uint32_t, 3>>
+      LDSToReplacementIndicesMap;
+  int32_t KernelId{-1};
+  uint32_t MallocSize{0};
+};
+
+// Struct to store infor for creation of offset table
+// for all the non-kernel LDS accesses.
+struct NonKernelLDSParameters {
+  GlobalVariable *LDSBaseTable{nullptr};
+  GlobalVariable *LDSOffsetTable{nullptr};
+  SetVector<Function *> OrderedKernels;
+  SetVector<GlobalVariable *> OrdereLDSGlobals;
+};
+
+class AMDGPUSwLowerLDS {
+public:
+  AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
+      : M(mod), IRB(M.getContext()), DTCallback(Callback) {}
+  bool Run();
+  void GetUsesOfLDSByNonKernels(CallGraph const &CG,
+                                FunctionVariableMap &functions);
+  SetVector<Function *>
+  GetOrderedIndirectLDSAccessingKernels(SetVector<Function *> &&Kernels);
+  SetVector<GlobalVariable *>
+  GetOrderedNonKernelAllLDSGlobals(SetVector<GlobalVariable *> &&Variables);
+  void PopulateMallocLDSGlobal(Function *Func);
+  void PopulateMallocMetadataGlobal(Function *Func);
+  void PopulateLDSToReplacementIndicesMap(Function *Func);
+  void ReplaceKernelLDSAccesses(Function *Func);
+  void LowerKernelLDSAccesses(Function *Func, DomTreeUpdater &DTU);
+  void BuildNonKernelLDSOffsetTable(
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+  void BuildNonKernelLDSBaseTable(
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+  Constant *
+  GetAddressesOfVariablesInKernel(Function *Func,
+                                  SetVector<GlobalVariable *> &Variables);
+  void LowerNonKernelLDSAccesses(
+      Function *Func, SetVector<GlobalVariable *> &LDSGlobals,
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+
+private:
+  Module &M;
+  IRBuilder<> IRB;
+  DomTreeCallback DTCallback;
+  DenseMap<Function *, std::shared_ptr<KernelLDSParameters>>
+      KernelToLDSParametersMap;
+};
+
+template <typename T> SetVector<T> SortByName(std::vector<T> &&V) {
+  // Sort the vector of globals or Functions based on their name.
+  // Returns a SetVector of globals/Functions.
+  llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
+    return L->getName() < R->getName();
+  });
+  return {std::move(SetVector<T>(V.begin(), V.end()))};
+}
+
+SetVector<GlobalVariable *> AMDGPUSwLowerLDS::GetOrderedNonKernelAllLDSGlobals(
+    SetVector<GlobalVariable *> &&Variables) {
+  // Sort all the non-kernel LDS accesses based on theor name.
+  SetVector<GlobalVariable *> Ordered = SortByName(
+      std::vector<GlobalVariable *>(Variables.begin(), Variables.end()));
+  return std::move(Ordered);
+}
+
+SetVector<Function *> AMDGPUSwLowerLDS::GetOrderedIndirectLDSAccessingKernels(
+    SetVector<Function *> &&Kernels) {
+  // Sort the non-kernels accessing LDS based on theor name.
+  // Also assign a kernel ID metadata based on the sorted order.
+  LLVMContext &Ctx = M.getContext();
+  if (Kernels.size() > UINT32_MAX) {
+    // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
+    report_fatal_error("Unimplemented SW LDS lowering for > 2**32 kernels");
+  }
+  SetVector<Function *> OrderedKernels =
+      SortByName(std::vector<Function *>(Kernels.begin(), Kernels.end()));
+  for (size_t i = 0; i < Kernels.size(); i++) {
+    Metadata *AttrMDArgs[1] = {
+        ConstantAsMetadata::get(IRB.getInt32(i)),
+    };
+    Function *Func = OrderedKernels[i];
+    Func->setMetadata("llvm.amdgcn.lds.kernel.id",
+                      MDNode::get(Ctx, AttrMDArgs));
+    auto &LDSParams = KernelToLDSParametersMap[Func];
+    assert(LDSParams);
+    LDSParams->KernelId = i;
+  }
+  return std::move(OrderedKernels);
+}
+
+void AMDGPUSwLowerLDS::GetUsesOfLDSByNonKernels(
+    CallGraph const &CG, FunctionVariableMap &functions) {
+  // Get uses from the current function, excluding uses by called functions
+  // Two output variables to avoid walking the globals list twice
+  for (auto &GV : M.globals()) {
+    if (!AMDGPU::isLDSVariableToLower(GV)) {
+      continue;
+    }
+
+    if (GV.isAbsoluteSymbolRef()) {
+      report_fatal_error(
+          "LDS variables with absolute addresses are unimplemented.");
+    }
+
+    for (User *V : GV.users()) {
+      User *FUU = V;
+      bool isCast = isa<BitCastOperator, AddrSpaceCastOperator>(FUU);
+      if (isCast && FUU->hasOneUse() && !FUU->user_begin()->user_empty())
+        FUU = *FUU->user_begin();
+      if (auto *I = dyn_cast<Instruction>(FUU)) {
+        Function *F = I->getFunction();
+        if (!isKernelLDS(F)) {
+          functions[F].insert(&GV);
+        }
+      }
+    }
+  }
+}
+
+void AMDGPUSwLowerLDS::PopulateMallocLDSGlobal(Function *Func) {
+  // Create new LDS global required for each kernel to store
+  // device global memory pointer.
+  auto &LDSParams = KernelToLDSParametersMap[Func];
+  assert(LDSParams);
+  // create new global pointer variable
+  LDSParams->MallocLDSGlobal = new GlobalVariable(
+      M, IRB.getPtrTy(), false, GlobalValue::InternalLinkage,
+      PoisonValue::get(IRB.getPtrTy()),
+      Twine("llvm.amdgcn.sw.lds." + F...
[truncated]

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return.ll

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

arsenm · 2024-04-02T14:50:45Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  // Sort the vector of globals or Functions based on their name.
+  // Returns a SetVector of globals/Functions.


Name should be a tie-breaker only. Sort by alignment/size?

amdgpu-lower-module-lds pass also uses sorting of globals based on name. It is required to maintain consistent order of globals in offset table and while replacing the LDS globals with offsets into new LDS global.

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

arsenm · 2024-04-02T15:01:24Z

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-function-param.ll

+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.amdgcn.workitem.id.z()
+; CHECK-NEXT:    [[TMP3:%.*]] = or i32 [[TMP0]], [[TMP1]]


Should try to strip the corresponding amdgpu-no-* attributes for introduced intrinsic calls

Added utility method from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils and removed amdgpu-no-workitem-id-* attributes from kernels which access LDS.

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return.ll

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

github-actions · 2024-04-08T15:28:08Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm

Title is misleading. I think the implementation of the pass, and adding it to the pass pipeline should be done in separate changes

Pierre-vh

mostly coding style nits. The coding style here differs a bit from what we usually see so I pointed out the things that stood out to me as someone that's not in the loop with this change.

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

Pierre-vh · 2024-04-18T10:41:14Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+
+class AMDGPUSwLowerLDS {
+public:
+  AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)


Suggested change

AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)

AMDGPUSwLowerLDS(Module &Mod, DomTreeCallback Callback)

CamelCase

Pierre-vh · 2024-04-18T10:41:36Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
+      : M(mod), IRB(M.getContext()), DTCallback(Callback) {}
+  bool run();
+  void getUsesOfLDSByNonKernels(CallGraph const &CG,


Suggested change

void getUsesOfLDSByNonKernels(CallGraph const &CG,

void getUsesOfLDSByNonKernels(const CallGraph &CG,

To be consistent with the codebase.

Pierre-vh · 2024-04-18T10:42:32Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  void getUsesOfLDSByNonKernels(CallGraph const &CG,
+                                FunctionVariableMap &functions);
+  SetVector<Function *>
+  getOrderedIndirectLDSAccessingKernels(SetVector<Function *> &&Kernels);


Please document those functions, even if it's just a short comment.
It helps maintainability.

Pierre-vh · 2024-04-18T10:44:27Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+template <typename T> SetVector<T> sortByName(std::vector<T> &&V) {
+  // Sort the vector of globals or Functions based on their name.
+  // Returns a SetVector of globals/Functions.
+  llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {


llvm:: is not needed, I think.
I also think you can just do llvm::sort(V, ..) ?

Pierre-vh · 2024-04-18T10:56:14Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp

+      set_union(transitive_map_function[&Func], direct_map_function[F]);
+
+      for (const CallGraphNode::CallRecord &R : *CG[F]) {
+        Function *ith = R.second->getFunction();


Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.

Pierre-vh · 2024-04-18T10:56:20Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp

+
+  // direct_map_kernel lists which variables are used by the kernel
+  // find the variables which are used through a function call
+  FunctionVariableMap indirect_map_kernel;


Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.

Pierre-vh · 2024-04-18T10:56:31Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp

+      continue;
+
+    for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
+      Function *ith = R.second->getFunction();


Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.

Pierre-vh · 2024-04-18T10:56:50Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp

+                               StringRef FnAttr) {
+  KernelRoot->removeFnAttr(FnAttr);
+
+  SmallVector<Function *> WorkList({CG[KernelRoot]->getFunction()});


= to assign

Pierre-vh · 2024-04-18T10:57:49Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+
+#include <algorithm>
+
+#define DEBUG_TYPE "amdgpu-sw-lower-lds"


nit: is it possible to add some LLVM_DEBUG output to this pass?
It greatly helps debug eventual issues

Added few debug outputs while replacing the LDS accesses. Thanks for suggestion.

arsenm

Needs a rebase, hard to see over the moved code patch

arsenm · 2024-05-08T16:57:56Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  //{StartOffset, AlignedSizeInBytes}
+  SmallString<128> MDItemStr;
+  raw_svector_ostream MDItemOS(MDItemStr);
+  MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.item";


Suggested change

MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.item";

MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.item";

arsenm · 2024-05-08T16:58:49Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+    auto MallocSizeCalcLambda =
+        [&](SetVector<GlobalVariable *> &DynamicLDSGlobals) {


Make this a regular helper function?

arsenm · 2024-05-08T16:59:16Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+    Value *ImplicitArg =
+        IRB.CreateIntrinsic(Intrinsic::amdgcn_implicitarg_ptr, {}, {});
+    Value *HiddenDynLDSSize = IRB.CreateInBoundsGEP(
+        ImplicitArg->getType(), ImplicitArg, {IRB.getInt32(15)});


Don't understand where the hardcoded 15 came from. There are various ConstInBoundsGEPs for this case too

These should also use 64-bit indexes, this is canonically a 64-bit address space. Can we use an enum or something more structured to access the ABI location? I'm assuming this is assuming COV5?

arsenm · 2024-05-08T16:59:35Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+
+    auto *GEPForEndStaticLDSSize = IRB.CreateInBoundsGEP(
+        MetadataStructType, SwLDSMetadata,
+        {IRB.getInt32(0), IRB.getInt32(NumStaticLDS - 1), IRB.getInt32(2)});


Use the Const* variants to hide all the getInt32s away

arsenm · 2024-05-08T17:17:37Z

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test.ll

@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
+; RUN: opt < %s -passes=amdgpu-sw-lower-lds -S -mtriple=amdgcn-- | FileCheck %s


Should specifically use amdhsa triples for these tests

… device global memory. (llvm#87265)

arsenm · 2024-05-09T15:34:55Z

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

-  /// Strip "amdgpu-no-lds-kernel-id" from any functions where we may have
-  /// introduced its use. If AMDGPUAttributor ran prior to the pass, we inferred
-  /// the lack of llvm.amdgcn.lds.kernel.id calls.
-  void removeNoLdsKernelIdFromReachable(CallGraph &CG, Function *KernelRoot) {


Is this rebased on main? This deletion should have already been merged when the code was moved to AMDGPUMemoryUtils?

Rebased and updated in latest commits.

#92686 PR raised to move remove this change.

arsenm · 2024-05-09T15:36:45Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+
+  SmallString<128> MDTypeStr;
+  raw_svector_ostream MDTypeOS(MDTypeStr);
+  MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.type";


Suggested change

MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.type";

MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.type";

another one

arsenm · 2024-05-09T15:36:55Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+      StructType::create(Ctx, Items, MDTypeOS.str());
+  SmallString<128> MDStr;
+  raw_svector_ostream MDOS(MDStr);
+  MDOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md";


Suggested change

MDOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md";

MDOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md";

arsenm · 2024-05-09T15:37:36Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+      Value *BasePlusOffset =
+          IRB.CreateInBoundsGEP(IRB.getInt8Ty(), SwLDS, {Load});
+      LLVM_DEBUG(dbgs() << "Sw LDS Lowering, Replacing LDS "
+                        << GV->getName().str());


Suggested change

<< GV->getName().str());

<< GV->getName());

arsenm · 2024-05-09T15:38:30Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+
+  ReplaceKernelLDSAccesses(Func);
+
+  auto *CondFreeBlock = BasicBlock::Create(Ctx, "CondFree", Func);


Presumably the runtime has to manage cleanup of anything that happened in the kernel?

arsenm · 2024-05-09T15:39:08Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  // Replace LDS access in non-kernel with replacement queried from
+  // Base table and offset from offset table.
+  LLVM_DEBUG(dbgs() << "Sw LDS lowering, lower non-kernel access for : "
+                    << Func->getName().str());


Suggested change

<< Func->getName().str());

<< Func->getName());

You should almost never need to convert to std::string

arsenm · 2024-05-09T15:39:19Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+    Value *BasePlusOffset =
+        IRB.CreateInBoundsGEP(IRB.getInt8Ty(), BasePtr, {OffsetLoad});
+    LLVM_DEBUG(dbgs() << "Sw LDS Lowering, Replace non-kernel LDS for "
+                      << GV->getName().str());


Suggested change

<< GV->getName().str());

<< GV->getName());

arsenm · 2024-05-09T15:40:16Z

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access.ll

@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4


These should use --check-globals since that's most of the point of the pass

--check-globals cmd-line option is updating the tests with globals. But, some of the tests when run, are failing with missing ']' "closing bracket "like example below.. So have updated tests with globals check which don't complain this error.

@llvm.amdgcn.sw.lds.offset.table = internal addrspace(4) constant [2 x [4 x i32]] [[4 x i32] [i32 ptrtoint (ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md to i32), i32 poison, ..

… device global memory. (llvm#87265)

…lvm#87265)

arsenm · 2024-05-23T18:45:45Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+      removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-x");
+      removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-y");
+      removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-z");


These could all be removed in one CallGraph walk instead of 3 separate ones

Currently removeFnAttrFromReachable accepts StringRef argument. Need to change it to accept array of stringref.

Raised #94188.

arsenm · 2024-05-23T18:46:07Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+  };
+  bool IsChanged = false;
+  AMDGPUSwLowerLDS SwLowerLDSImpl(M, DTCallback);
+  IsChanged |= SwLowerLDSImpl.run();


Can just define isChanged here

…lvm#87265)

arsenm

I'm not sure I trust the pass ordering with this strategy, and I think the pass name should not claim it's software lowering when it's not really that

arsenm · 2024-08-21T19:03:54Z

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

+        for (auto &GV : LDSGlobals) {
+          if (is_contained(UniqueLDSGlobals, GV))
+            continue;
+          else


Don't need else after continue

b-sumner · 2024-08-21T19:23:26Z

Introducing more LDS while also lowering it also introduces a range of possible pass order binding problems. You could just as well put the pointer in a regular addrspace(1) global

Each workgroup needs a different pointer, and a grid can have a lot of work groups. I don't think a global would work.

…lvm#87265)

arsenm · 2024-08-22T07:34:03Z

Each workgroup needs a different pointer, and a grid can have a lot of work groups. I don't think a global would work.

So why isn't the runtime responsible for setting up this pointer then? It does that effectively for LDS, the allocation is managed as part of the dispatch. That also goes back to my question of why we need to insert explicit free code, instead of just letting the runtime clean it up after as it would need to anyway

b-sumner · 2024-08-22T14:30:13Z

So why isn't the runtime responsible for setting up this pointer then? It does that effectively for LDS, the allocation is managed as part of the dispatch. That also goes back to my question of why we need to insert explicit free code, instead of just letting the runtime clean it up after as it would need to anyway

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

arsenm · 2024-08-22T15:16:34Z

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

Presumably the implicit kernel arguments, and whatever is setting that up. It's essentially a partner to the queue pointer, which also is in the implicit kernargs

b-sumner · 2024-08-22T15:46:02Z

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

Presumably the implicit kernel arguments, and whatever is setting that up. It's essentially a partner to the queue pointer, which also is in the implicit kernargs

OK. Suppose the launch has a million work groups. How much memory should the runtime allocate, and how will workgroup J decode what part of that memory to use? It can certainly be done but I'm wondering if we really need to do it now? And how much do we really need an independently working SW LDS?

arsenm · 2024-08-22T18:15:58Z

OK. Suppose the launch has a million work groups. How much memory should the runtime allocate, and how will workgroup J decode what part of that memory to use?

The runtime is already bounded on how many groups it can dispatch at once; the allocation is tied to the dispatch size.

It can certainly be done but I'm wondering if we really need to do it now? And how much do we really need an independently working SW LDS?

I think having the trap door of pure software LDS would enable some useful experiments, such as not depending on any whole program visibility to lower function defined local variables. It also reduces the number of parts that need to directly interact in the compiler pipeline. With the current approach I foresee having to fix the same bugs twice in the module LDS lowering, and the asan version of module LDS lowering

b-sumner · 2024-08-22T19:31:09Z

The runtime is already bounded on how many groups it can dispatch at once; the allocation is tied to the dispatch size.

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I think having the trap door of pure software LDS would enable some useful experiments, such as not depending on any whole program visibility to lower function defined local variables. It also reduces the number of parts that need to directly interact in the compiler pipeline. With the current approach I foresee having to fix the same bugs twice in the module LDS lowering, and the asan version of module LDS lowering

I don't disagree. But reading global memory for the pointer will be slower. The runtime launching one dispatch at a time to manage the memory will be slower, and we still need a kernel prolog and epilog for each workgroup to allocate and deallocate it's chunk of the global allocation, and I still don't know where we are going to store the per-workgroup workgroup-allocation-chunk-index or pointer.

arsenm · 2024-08-23T03:01:06Z

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I thought it already had to do this if stack was enabled to avoid going over a device wide limit

b-sumner · 2024-08-23T04:03:15Z

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I thought it already had to do this if stack was enabled to avoid going over a device wide limit

Yes, there is a special mode when scratch space is low but something like that would not be desirable to impose on every dispatch.

…lvm#87265)

llvm-ci · 2024-08-26T03:43:41Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux running on sanitizer-buildbot7 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/2930

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[4761/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[4762/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[4763/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o
[4764/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o
[4765/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
[4766/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o
[4767/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
[4768/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o
[4769/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o
[4770/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[4771/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o
[4772/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[4773/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[4774/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
[4775/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
[4776/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[4777/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4778/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o
[4779/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
[4780/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[4781/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o
[4782/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
[4783/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[4784/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[4785/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
[4786/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4787/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o
[4788/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o
[4789/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[4790/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o
[4791/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4792/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o
[4793/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[4794/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4795/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[4796/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[4797/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
[4798/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
[4799/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[4800/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
Step 8 (build compiler-rt symbolizer) failure: build compiler-rt symbolizer (failure)
...
[4761/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[4762/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[4763/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o
[4764/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o
[4765/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
[4766/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o
[4767/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
[4768/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o
[4769/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o
[4770/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[4771/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o
[4772/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[4773/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[4774/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
[4775/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
[4776/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[4777/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4778/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o
[4779/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
[4780/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[4781/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o
[4782/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
[4783/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[4784/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[4785/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
[4786/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4787/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o
[4788/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o
[4789/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[4790/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o
[4791/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4792/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o
[4793/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[4794/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4795/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[4796/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[4797/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
[4798/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
[4799/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[4800/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[175/216] Linking CXX executable bin/sancov
[176/216] Linking CXX executable bin/llvm-ar
[177/216] Linking CXX executable bin/llvm-objdump
[178/216] Generating ../../bin/llvm-ranlib
[179/216] Linking CXX executable bin/llvm-nm
[180/216] Linking CXX executable bin/llvm-jitlink
[181/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
[182/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
[183/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
[184/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[185/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
[186/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[187/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 10 (build compiler-rt debug) failure: build compiler-rt debug (failure)
...
[5176/5252] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5177/5252] Linking CXX executable bin/llvm-objdump
[5178/5252] Generating ../../bin/llvm-otool
[5179/5252] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5180/5252] Linking CXX executable bin/llvm-nm
[5181/5252] Linking CXX executable bin/llvm-jitlink
[5182/5252] Linking CXX executable bin/llvm-profgen
[5183/5252] Linking CXX executable bin/llvm-rtdyld
[5184/5252] Linking CXX executable bin/sancov
[5185/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
@@@BUILD_STEP test compiler-rt debug@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 12 (build compiler-rt tsan_debug) failure: build compiler-rt tsan_debug (failure)
...
[5157/5233] Linking CXX executable bin/llvm-ml
[5158/5233] Linking CXX executable bin/llvm-nm
[5159/5233] Linking CXX executable bin/llvm-objdump
[5160/5233] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5161/5233] Linking CXX executable bin/llvm-jitlink
[5162/5233] Generating ../../bin/llvm-otool
[5163/5233] Linking CXX executable bin/llvm-rtdyld
[5164/5233] Linking CXX executable bin/llvm-profgen
[5165/5233] Linking CXX executable bin/sancov
[5166/5233] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 13 (build compiler-rt default) failure: build compiler-rt default (failure)
...
[5176/5252] Linking CXX executable bin/llvm-nm
[5177/5252] Linking CXX executable bin/llvm-objdump
[5178/5252] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5179/5252] Building CXX object tools/clang/tools/clang-repl/CMakeFiles/clang-repl.dir/ClangRepl.cpp.o
[5180/5252] Generating ../../bin/llvm-otool
[5181/5252] Linking CXX executable bin/llvm-jitlink
[5182/5252] Linking CXX executable bin/llvm-profgen
[5183/5252] Linking CXX executable bin/llvm-rtdyld
[5184/5252] Linking CXX executable bin/sancov
[5185/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
@@@BUILD_STEP test compiler-rt default@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 15 (build standalone compiler-rt) failure: build standalone compiler-rt (failure)
...
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.
Call Stack (most recent call first):
  CMakeLists.txt:12 (include)
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Didn't find assembler
CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_C_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_CXX_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  No CMAKE_ASM_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "ASM" or the CMake cache entry CMAKE_ASM_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
-- Warning: Did not find file Compiler/-ASM
-- Configuring incomplete, errors occurred!

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 16 (test standalone compiler-rt) failure: test standalone compiler-rt (failure)
@@@BUILD_STEP test standalone compiler-rt@@@
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

llvm-ci · 2024-08-26T04:10:22Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/1725

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[3608/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[3609/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[3610/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[3611/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
[3612/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[3613/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[3614/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[3615/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
[3616/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[3617/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[3618/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[3619/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[3620/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[3621/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o
[3622/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
[3623/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[3624/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[3625/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[3626/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[3627/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o
[3628/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[3629/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[3630/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
Step 8 (bootstrap clang) failure: bootstrap clang (failure)
...
[3608/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[3609/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[3610/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[3611/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
[3612/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[3613/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[3614/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[3615/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
[3616/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[3617/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[3618/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[3619/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[3620/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[3621/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o
[3622/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
[3623/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[3624/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[3625/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[3626/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[3627/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o
[3628/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[3629/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[3630/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
program finished with exit code 2
elapsedTime=229.699745

llvm-ci · 2024-08-26T04:19:29Z

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building llvm at step 12 "build-stage2-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/2563

Here is the relevant piece of the build log for the reference

Step 12 (build-stage2-unified-tree) failure: build (failure)
...
359.310 [161/208/5859] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/DwarfLinkerForBinary.cpp.o
359.342 [161/207/5860] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/MachineDomTreeUpdaterTest.cpp.o
359.501 [161/206/5861] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/NewPMDriver.cpp.o
359.503 [161/205/5862] Building CXX object tools/sancov/CMakeFiles/sancov.dir/sancov.cpp.o
359.676 [161/204/5863] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
359.720 [161/203/5864] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/DriverUtils.cpp.o
359.792 [161/202/5865] Building CXX object tools/clang/unittests/CrossTU/CMakeFiles/CrossTUTests.dir/CrossTranslationUnitTest.cpp.o
360.084 [161/201/5866] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclAttr.cpp.o
360.266 [161/200/5867] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
360.926 [161/199/5868] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
ccache /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
360.958 [161/198/5869] Building CXX object tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/ASTReader.cpp.o
360.982 [161/197/5870] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
361.005 [161/196/5871] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/SyntheticSections.cpp.o
361.049 [161/195/5872] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
361.322 [161/194/5873] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMIRFormatter.cpp.o
361.651 [161/193/5874] Building CXX object tools/lld/wasm/CMakeFiles/lldWasm.dir/SymbolTable.cpp.o
361.729 [161/192/5875] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o
361.842 [161/191/5876] Building CXX object tools/llvm-lto/CMakeFiles/llvm-lto.dir/llvm-lto.cpp.o
362.061 [161/190/5877] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
362.354 [161/189/5878] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/StandardLibraryTest.cpp.o
362.391 [161/188/5879] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
362.494 [161/187/5880] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/LexicalScopesTest.cpp.o
362.563 [161/186/5881] Building CXX object tools/llvm-lto2/CMakeFiles/llvm-lto2.dir/llvm-lto2.cpp.o
362.631 [161/185/5882] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
362.643 [161/184/5883] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
362.922 [161/183/5884] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
362.997 [161/182/5885] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
363.012 [161/181/5886] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/MachineInstrTest.cpp.o
363.270 [161/180/5887] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/AsmPrinterDwarfTest.cpp.o
363.425 [161/179/5888] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
363.592 [161/178/5889] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
363.649 [161/177/5890] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
363.746 [161/176/5891] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/FixItTest.cpp.o
363.859 [161/175/5892] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/SymbolTable.cpp.o
364.248 [161/174/5893] Building CXX object tools/llvm-nm/CMakeFiles/llvm-nm.dir/llvm-nm.cpp.o
364.332 [161/173/5894] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
364.488 [161/172/5895] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/ProfileGenerator.cpp.o
364.580 [161/171/5896] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
364.648 [161/170/5897] Building CXX object unittests/Target/AMDGPU/CMakeFiles/AMDGPUTests.dir/ExecMayBeModifiedBeforeAnyUse.cpp.o
365.186 [161/169/5898] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/Writer.cpp.o

llvm-ci · 2024-08-26T04:20:27Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 4 "build stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/2236

Here is the relevant piece of the build log for the reference

Step 4 (build stage 1) failure: 'ninja' (failure)
...
[5952/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[5953/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[5954/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In member function ‘bool llvm::AMDGPUInstructionSelector::selectG_TRUNC(llvm::MachineInstr&) const’:
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:2301: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
         DstSize < 32 ? AMDGPU::sub0 : TRI.getSubRegFromChannel(0, DstSize / 32);
 
[5955/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o
[5956/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[5957/6117] Linking CXX shared library lib/libLLVMAMDGPUCodeGen.so.20.0git
FAILED: lib/libLLVMAMDGPUCodeGen.so.20.0git 
: && /usr/lib64/ccache/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMAMDGPUCodeGen.so.20.0git -o lib/libLLVMAMDGPUCodeGen.so.20.0git lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateUniformValues.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsanInstrumentation.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCombinerHelper.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCtorDtorLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUExportClustering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUFrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUGlobalISelDivergenceLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUGlobalISelUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelAttributes.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineFunction.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineModuleInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMacroFusion.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUIGroupLP.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir
Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMIRFormatter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteOutArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteUndefForPHI.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSetWavePriority.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSplitModule.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSubtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetObjectFile.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetTransformInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUUnifyDivergentExitNodes.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUUnifyMetadata.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineCFGStructurizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNHazardRecognizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNILPSched.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNIterativeScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNMinRegStrategy.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSubtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNVOPDUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600AsmPrinter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ClauseMergePass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ControlFlowFinalizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600EmitClauseMarkers.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ExpandSpecialInstrs.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600FrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineFunctionInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MCInstLower.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600OpenCLImageTypeLoweringPass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600OptimizeVectorRegisters.cpp.o lib/Target/AM
GPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600Packetizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600RegisterInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600Subtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetMachine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixSGPRCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertWaitcnts.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILoadStoreOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerControlFlow.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerI1Copies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineFunctionInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMemoryLegalizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIModeRegister.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIModeRegisterDefaults.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMasking.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMaskingPreRA.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPeepholeSDWA.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPostRABundler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreEmitPeephole.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIProgramInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIShrinkInstructions.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib:"  lib/libLLVMAMDGPUDesc.so.20.0git  lib/libLLVMAMDGPUInfo.so.20.0git  lib/libLLVMAMDGPUUtils.so.20.0git  lib/libLLVMAsmPrinter.so.20.0git  lib/libLLVMGlobalISel.so.20.0git  lib/libLLVMMIRParser.so.20.0git  lib/libLLVMPasses.so.20.0git  lib/libLLVMSelectionDAG.so.20.0git  lib/libLLVMCodeGen.so.20.0git  lib/libLLVMCodeGenTypes.so.20.0git  lib/libLLVMHipStdPar.so.20.0git  lib/libLLVMIRPrinter.so.20.0git  lib/libLLVMTarget.so.20.0git  lib/libLLVMipo.so.20.0git  lib/libLLVMVectorize.so.20.0git  lib/libLLVMScalarOpts.so.20.0git  lib/libLLVMTransformUtils.so.20.0git  lib/libLLVMAnalysis.so.20.0git  lib/libLLVMMC.so.20.0git  lib/libLLVMCore.so.20.0git  lib/libLLVMBinaryFormat.so.20.0git  lib/libLLVMTargetParser.so.20.0git  lib/libLLVMSupport.so.20.0git  -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o: In function `(anonymous namespace)::AMDGPUSwLowerLDS::run()':
AMDGPUSwLowerLDS.cpp:(.text._ZN12_GLOBAL__N_116AMDGPUSwLowerLDS3runEv+0x164): undefined reference to `llvm::getAddressSanitizerParams(llvm::Triple const&, int, bool, unsigned long*, int*, bool*)'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

llvm-ci · 2024-08-26T04:46:28Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-rhel running on ppc64le-clang-rhel-test while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/1462

Here is the relevant piece of the build log for the reference

Step 5 (build-unified-tree) failure: build (failure)
...
139.820 [203/52/5998] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMemoryLegalizer.cpp.o
140.808 [203/51/5999] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
140.823 [203/50/6000] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
140.905 [203/49/6001] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
140.983 [203/48/6002] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
141.544 [203/47/6003] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
141.562 [203/46/6004] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixSGPRCopies.cpp.o
141.931 [203/45/6005] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerControlFlow.cpp.o
141.935 [203/44/6006] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
141.942 [203/43/6007] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
ccache /home/docker/llvm-external-buildbots/clang.17.0.6/bin/clang++ --gcc-toolchain=/gcc-toolchain/usr -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
141.959 [203/42/6008] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
142.166 [203/41/6009] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o
142.236 [203/40/6010] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
142.391 [203/39/6011] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
142.406 [203/38/6012] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
142.697 [203/37/6013] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
142.701 [203/36/6014] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o
143.705 [203/35/6015] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
143.707 [203/34/6016] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
143.716 [203/33/6017] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
143.753 [203/32/6018] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
143.754 [203/31/6019] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUBaseInfo.cpp.o
143.873 [203/30/6020] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
144.095 [203/29/6021] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
144.556 [203/28/6022] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
144.994 [203/27/6023] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetTransformInfo.cpp.o
145.250 [203/26/6024] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o
145.364 [203/25/6025] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
145.404 [203/24/6026] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
145.462 [203/23/6027] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
145.605 [203/22/6028] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
146.084 [203/21/6029] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
146.423 [203/20/6030] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
146.519 [203/19/6031] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
146.662 [203/18/6032] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
146.729 [203/17/6033] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
147.205 [203/16/6034] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o
147.214 [203/15/6035] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
147.992 [203/14/6036] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
148.008 [203/13/6037] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o

Fixes linking error in llvm CI: "AMDGPUSwLowerLDS::run()': AMDGPUSwLowerLDS.cpp:(.text._ZN12_GLOBAL__N_116AMDGPUSwLowerLDS3runEv+0x164): undefined reference to `llvm::getAddressSanitizerParams(llvm::Triple const&, int, bool, unsigned long*, int*, bool*)'" #87265 amdgpu-sw-lower-lds pass uses getAddressSanitizerParams method from AddressSanitizer pass. It misses linking of LLVMInstrumentation to AMDGPUCodegen. This PR adds it.

skc7 · 2024-08-26T06:46:59Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 4 "build stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/2236

Here is the relevant piece of the build log for the reference

Issue should be fixed by #106039

llvm-ci · 2024-08-26T13:51:28Z

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 3 "clean-build-dir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/786

Here is the relevant piece of the build log for the reference

Step 3 (clean-build-dir) failure: Delete failed. (failure) (timed out)
Step 5 (build-unified-tree) failure: build (failure)
...
2079.828 [3212/10/1898] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
2083.397 [3211/10/1899] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
2085.188 [3210/10/1900] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelAttributes.cpp.o
2085.439 [3209/10/1901] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
2092.104 [3208/10/1902] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineFunction.cpp.o
2092.205 [3207/10/1903] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineModuleInfo.cpp.o
2094.678 [3206/10/1904] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
2095.493 [3205/10/1905] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
2096.553 [3204/10/1906] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMacroFusion.cpp.o
2096.914 [3203/10/1907] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
/usr/local/clang-17.0.2/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LARGE_FILE_API -D_XOPEN_SOURCE=700 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/lib/Target/AMDGPU -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/include -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/include -mcmodel=large -fPIC -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
2098.530 [3203/9/1908] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
2102.674 [3203/8/1909] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
2104.488 [3203/7/1910] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
2107.327 [3203/6/1911] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
2109.105 [3203/5/1912] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
2111.338 [3203/4/1913] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
2117.416 [3203/3/1914] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
2123.153 [3203/2/1915] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUIGroupLP.cpp.o
2134.051 [3203/1/1916] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
ninja: build stopped: subcommand failed.

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

This change adds the utilities required to asan instrument memory instructions. In "amdgpu-sw-lower-lds" pass llvm#87265, during lowering from LDS to global memory, new instructions in global memory would be created which need to be asan instrumented. Change-Id: I17f0371cdc15ea7af6c4e2a325af6ad96a5bfb7b

…lvm#87265) This PR introduces new pass "amdgpu-sw-lower-lds". This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory. The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors. This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag. For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). **Replacement of Kernel LDS accesses:** - All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS". - A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation. - A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass. - **SW LDS:** It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.<kernel-name>". - **SW LDS Metadata:** It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument. - At the epilogue of kernel, allocated memory would be made free by the same single work-item. **Replacement of non-kernel LDS accesses:** - Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. - This information is used to build two tables: - **Base table:** Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel. - **Offset table:** Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel. - A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained. Change-Id: I047105a7585878f780a2e741d2116f3c48232e1f

llvm-ci · 2025-02-25T12:54:45Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/145

Here is the relevant piece of the build log for the reference

Step 5 (compile-openmp) failure: build (failure)

skc7 requested a review from JonChesterfield April 1, 2024 17:07

llvmbot added the backend:AMDGPU label Apr 1, 2024

skc7 requested review from arsenm, b-sumner and ampandey-1995 April 1, 2024 17:07

arsenm reviewed Apr 2, 2024

View reviewed changes

JanekvO reviewed Apr 3, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp Outdated Show resolved Hide resolved

skc7 mentioned this pull request Apr 8, 2024

[AMDGPU] Move LDS utilities from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils #88002

Merged

arsenm reviewed Apr 17, 2024

View reviewed changes

skc7 force-pushed the skc7/sw_lower_lds branch from 98d5f94 to e60eb97 Compare April 18, 2024 10:30

skc7 changed the title ~~[AMDGPU] Enable amdgpu-sw-lower-lds pass to lower LDS accesses to use device global memory~~ [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses to use device global memory. Apr 18, 2024

skc7 mentioned this pull request Apr 18, 2024

[AMDGPU] Enable "amdgpu-sw-lower-lds" pass in pipeline. #89206

Merged

Pierre-vh reviewed Apr 18, 2024

View reviewed changes

arsenm requested changes May 8, 2024

View reviewed changes

arsenm reviewed May 8, 2024

View reviewed changes

skc7 added a commit to skc7/llvm-project that referenced this pull request May 9, 2024

[AMDGPU] Enable amdgpu-sw-lower-lds pass to lower LDS accesses to use…

9d9e023

… device global memory. (llvm#87265)

skc7 force-pushed the skc7/sw_lower_lds branch from f01647b to 9d9e023 Compare May 9, 2024 11:25

arsenm reviewed May 9, 2024

View reviewed changes

skc7 added a commit to skc7/llvm-project that referenced this pull request May 10, 2024

[AMDGPU] Enable amdgpu-sw-lower-lds pass to lower LDS accesses to use…

8c0acc9

… device global memory. (llvm#87265)

skc7 force-pushed the skc7/sw_lower_lds branch from 9d9e023 to f2f4138 Compare May 10, 2024 17:28

skc7 force-pushed the skc7/sw_lower_lds branch from 2dc9064 to 6ad1fc2 Compare May 19, 2024 12:16

skc7 added a commit to skc7/llvm-project that referenced this pull request May 20, 2024

[AMDGPU] Introduce amdgpu-sw-lower-lds pass to lower LDS accesses. (l…

14cada9

…lvm#87265)

skc7 force-pushed the skc7/sw_lower_lds branch from 6ad1fc2 to 14cada9 Compare May 20, 2024 05:49

skc7 changed the title ~~[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses to use device global memory.~~ [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. May 20, 2024

skc7 mentioned this pull request May 23, 2024

[AMDGPU] Introduce address sanitizer instrumentation for LDS lowered by amdgpu-sw-lower-lds pass #89208

Closed

arsenm reviewed May 23, 2024

View reviewed changes

skc7 added a commit to skc7/llvm-project that referenced this pull request Jun 7, 2024

[AMDGPU] Introduce amdgpu-sw-lower-lds pass to lower LDS accesses. (l…

0acc2dc

…lvm#87265)

arsenm approved these changes Aug 21, 2024

View reviewed changes

skc7 added a commit to skc7/llvm-project that referenced this pull request Aug 22, 2024

[AMDGPU] Introduce amdgpu-sw-lower-lds pass to lower LDS accesses. (l…

8faffb9

…lvm#87265)

skc7 force-pushed the skc7/sw_lower_lds branch from d489ba1 to 8faffb9 Compare August 22, 2024 05:26

[AMDGPU] Introduce amdgpu-sw-lower-lds pass to lower LDS accesses. (l…

abfcc87

…lvm#87265)

skc7 force-pushed the skc7/sw_lower_lds branch from 8faffb9 to abfcc87 Compare August 25, 2024 05:06

skc7 merged commit 7bc9d95 into llvm:main Aug 26, 2024
8 checks passed

skc7 mentioned this pull request Aug 26, 2024

[AMDGPU] Add LLVMInstrumnetation to link with AMDGPUCodeGen. #106039

Merged

kosarev reviewed Oct 1, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp Show resolved Hide resolved

		// Sort the vector of globals or Functions based on their name.
		// Returns a SetVector of globals/Functions.

	AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
	AMDGPUSwLowerLDS(Module &Mod, DomTreeCallback Callback)

	void getUsesOfLDSByNonKernels(CallGraph const &CG,
	void getUsesOfLDSByNonKernels(const CallGraph &CG,


		#include <algorithm>

		#define DEBUG_TYPE "amdgpu-sw-lower-lds"

	MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.item";
	MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.item";

		auto MallocSizeCalcLambda =
		[&](SetVector<GlobalVariable *> &DynamicLDSGlobals) {

		@@ -0,0 +1,58 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
		; RUN: opt < %s -passes=amdgpu-sw-lower-lds -S -mtriple=amdgcn-- \| FileCheck %s

	MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.type";
	MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.type";

	MDOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md";
	MDOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md";


		ReplaceKernelLDSAccesses(Func);

		auto *CondFreeBlock = BasicBlock::Create(Ctx, "CondFree", Func);

		@@ -0,0 +1,100 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4

[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. #87265

[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. #87265

Conversation

skc7 commented Apr 1, 2024 • edited Loading

llvmbot commented Apr 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 8, 2024 • edited Loading

arsenm left a comment

Choose a reason for hiding this comment

Pierre-vh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skc7 Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-sumner commented Aug 21, 2024

arsenm commented Aug 22, 2024

b-sumner commented Aug 22, 2024

arsenm commented Aug 22, 2024

b-sumner commented Aug 22, 2024

arsenm commented Aug 22, 2024

b-sumner commented Aug 22, 2024 • edited Loading

arsenm commented Aug 23, 2024

b-sumner commented Aug 23, 2024

llvm-ci commented Aug 26, 2024

llvm-ci commented Aug 26, 2024

llvm-ci commented Aug 26, 2024

llvm-ci commented Aug 26, 2024

llvm-ci commented Aug 26, 2024

skc7 commented Aug 26, 2024

llvm-ci commented Aug 26, 2024

llvm-ci commented Feb 25, 2025

skc7 commented Apr 1, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

skc7 Apr 19, 2024 •

edited

Loading

b-sumner commented Aug 22, 2024 •

edited

Loading