Skip to content

[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. #87265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

skc7
Copy link
Contributor

@skc7 skc7 commented Apr 1, 2024

This PR introduces new pass "amdgpu-sw-lower-lds".

This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory.
The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors.
This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag.

For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels).

Replacement of Kernel LDS accesses:

  • All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS".

  • A new "llvm.amdgcn..dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation.

  • A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass.

  • SW LDS:
    It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.".

  • SW LDS Metadata:
    It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds..md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.

  • At the epilogue of kernel, allocated memory would be made free by the same single work-item.

Replacement of non-kernel LDS accesses:

  • Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted.

  • This information is used to build two tables:

  • Base table:
    Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel.

  • Offset table:
    Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel.

  • A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.

@llvmbot
Copy link
Member

llvmbot commented Apr 1, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Chaitanya (skc7)

Changes

This PR introduces new pass "amdgpu-sw-lower-lds". It lowers the local data store, LDS, uses in kernel and non-kernel functions in module with dynamically allocated device global memory.

Replacement of Kernel LDS accesses:

  • For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). A device global memory equal to size of all these LDS globals will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in new LDS global that will be created for the kernel. This will be called "malloc LDS global" in this pass. Each LDS access corresponds to an offset in the allocated memory. All static LDS accesses will be allocated first and then dynamic LDS will occupy the device global memory. To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "metadata global" in this pass.
  • Malloc LDS Global:
    It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.{kernel-name}".
  • Metadata Global:
    It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced.
    It will have name "llvm.amdgcn.sw.lds.{kernel-name}.md".
    This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.
  • LDS accesses within the kernel will be replaced by "gep" ptr to corresponding offset into allocated device global memory for the kernel.
  • At the epilogue of kernel, allocated memory would be made free by the same single work-item.

Replacement of non-kernel LDS accesses:

  • Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. This information is used to build two globals which are used as tables for query of LDS replacement:

  • Base table:
    Base table will have single row, with elements of the row placed as per kernel ID.
    Each element in the row corresponds to addresss of "malloc LDS global" variable created for that kernel.

  • Offset table:
    Offset table will have multiple rows and columns.
    Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the address of the replacement of LDS global done by that particular kernel. A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, address of "malloc LDS global" for that corresponding kernel is obtained from base table. The Offset into the base "malloc LDS global" is obtained from corresponding element in offset table. With this information, replacement value is obtained.

  • Other changes:
    "amdgpu-sw-lower-lds" pass is needed for address sanitizer instrumentation of LDS. So, this pass is enabled only if asan is enabled.
    There are certain utility functions which can be reused from lower-module-lds pass. These functions are moved to AMDGPUMemoryUtils module for re-use in this new pass. lower-module-lds pass will be disabled if asan feature is enabled.


Patch is 122.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87265.diff

18 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+9)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp (+1-185)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp (+865)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+6)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.cpp (+176)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUMemoryUtils.h (+24)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access.ll (+99)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-lds-test.ll (+57)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access.ll (+192)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multiple-blocks-return.ll (+79)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-indirect-access.ll (+101)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-lds-test.ll (+88)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-function-param.ll (+61)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested.ll (+212)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access.ll (+84)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test.ll (+58)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 6016bd5187d887..15ff74f7c53af3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -263,6 +263,15 @@ struct AMDGPUAlwaysInlinePass : PassInfoMixin<AMDGPUAlwaysInlinePass> {
   bool GlobalOpt;
 };
 
+void initializeAMDGPUSwLowerLDSLegacyPass(PassRegistry &);
+extern char &AMDGPUSwLowerLDSLegacyPassID;
+ModulePass *createAMDGPUSwLowerLDSLegacyPass();
+
+struct AMDGPUSwLowerLDSPass : PassInfoMixin<AMDGPUSwLowerLDSPass> {
+  AMDGPUSwLowerLDSPass() {}
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
 class AMDGPUCodeGenPreparePass
     : public PassInfoMixin<AMDGPUCodeGenPreparePass> {
 private:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
index 595f09664c55e4..f0456d3f62a816 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
@@ -212,6 +212,7 @@
 #define DEBUG_TYPE "amdgpu-lower-module-lds"
 
 using namespace llvm;
+using namespace AMDGPU;
 
 namespace {
 
@@ -234,17 +235,6 @@ cl::opt<LoweringKind> LoweringKindLoc(
         clEnumValN(LoweringKind::hybrid, "hybrid",
                    "Lower via mixture of above strategies")));
 
-bool isKernelLDS(const Function *F) {
-  // Some weirdness here. AMDGPU::isKernelCC does not call into
-  // AMDGPU::isKernel with the calling conv, it instead calls into
-  // isModuleEntryFunction which returns true for more calling conventions
-  // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
-  // There's also a test that checks that the LDS lowering does not hit on
-  // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
-  // Putting LDS in the name of the function to draw attention to this.
-  return AMDGPU::isKernel(F->getCallingConv());
-}
-
 template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
   llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
     return L->getName() < R->getName();
@@ -305,183 +295,9 @@ class AMDGPULowerModuleLDS {
         Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
   }
 
-  static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
-    // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
-    // global may have uses from multiple different functions as a result.
-    // This pass specialises LDS variables with respect to the kernel that
-    // allocates them.
-
-    // This is semantically equivalent to (the unimplemented as slow):
-    // for (auto &F : M.functions())
-    //   for (auto &BB : F)
-    //     for (auto &I : BB)
-    //       for (Use &Op : I.operands())
-    //         if (constantExprUsesLDS(Op))
-    //           replaceConstantExprInFunction(I, Op);
-
-    SmallVector<Constant *> LDSGlobals;
-    for (auto &GV : M.globals())
-      if (AMDGPU::isLDSVariableToLower(GV))
-        LDSGlobals.push_back(&GV);
-
-    return convertUsersOfConstantsToInstructions(LDSGlobals);
-  }
-
 public:
   AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
 
-  using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
-
-  using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
-
-  static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
-                                     FunctionVariableMap &kernels,
-                                     FunctionVariableMap &functions) {
-
-    // Get uses from the current function, excluding uses by called functions
-    // Two output variables to avoid walking the globals list twice
-    for (auto &GV : M.globals()) {
-      if (!AMDGPU::isLDSVariableToLower(GV)) {
-        continue;
-      }
-
-      for (User *V : GV.users()) {
-        if (auto *I = dyn_cast<Instruction>(V)) {
-          Function *F = I->getFunction();
-          if (isKernelLDS(F)) {
-            kernels[F].insert(&GV);
-          } else {
-            functions[F].insert(&GV);
-          }
-        }
-      }
-    }
-  }
-
-  struct LDSUsesInfoTy {
-    FunctionVariableMap direct_access;
-    FunctionVariableMap indirect_access;
-  };
-
-  static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
-
-    FunctionVariableMap direct_map_kernel;
-    FunctionVariableMap direct_map_function;
-    getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
-
-    // Collect variables that are used by functions whose address has escaped
-    DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
-    for (Function &F : M.functions()) {
-      if (!isKernelLDS(&F))
-        if (F.hasAddressTaken(nullptr,
-                              /* IgnoreCallbackUses */ false,
-                              /* IgnoreAssumeLikeCalls */ false,
-                              /* IgnoreLLVMUsed */ true,
-                              /* IgnoreArcAttachedCall */ false)) {
-          set_union(VariablesReachableThroughFunctionPointer,
-                    direct_map_function[&F]);
-        }
-    }
-
-    auto functionMakesUnknownCall = [&](const Function *F) -> bool {
-      assert(!F->isDeclaration());
-      for (const CallGraphNode::CallRecord &R : *CG[F]) {
-        if (!R.second->getFunction()) {
-          return true;
-        }
-      }
-      return false;
-    };
-
-    // Work out which variables are reachable through function calls
-    FunctionVariableMap transitive_map_function = direct_map_function;
-
-    // If the function makes any unknown call, assume the worst case that it can
-    // access all variables accessed by functions whose address escaped
-    for (Function &F : M.functions()) {
-      if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
-        if (!isKernelLDS(&F)) {
-          set_union(transitive_map_function[&F],
-                    VariablesReachableThroughFunctionPointer);
-        }
-      }
-    }
-
-    // Direct implementation of collecting all variables reachable from each
-    // function
-    for (Function &Func : M.functions()) {
-      if (Func.isDeclaration() || isKernelLDS(&Func))
-        continue;
-
-      DenseSet<Function *> seen; // catches cycles
-      SmallVector<Function *, 4> wip{&Func};
-
-      while (!wip.empty()) {
-        Function *F = wip.pop_back_val();
-
-        // Can accelerate this by referring to transitive map for functions that
-        // have already been computed, with more care than this
-        set_union(transitive_map_function[&Func], direct_map_function[F]);
-
-        for (const CallGraphNode::CallRecord &R : *CG[F]) {
-          Function *ith = R.second->getFunction();
-          if (ith) {
-            if (!seen.contains(ith)) {
-              seen.insert(ith);
-              wip.push_back(ith);
-            }
-          }
-        }
-      }
-    }
-
-    // direct_map_kernel lists which variables are used by the kernel
-    // find the variables which are used through a function call
-    FunctionVariableMap indirect_map_kernel;
-
-    for (Function &Func : M.functions()) {
-      if (Func.isDeclaration() || !isKernelLDS(&Func))
-        continue;
-
-      for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
-        Function *ith = R.second->getFunction();
-        if (ith) {
-          set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
-        } else {
-          set_union(indirect_map_kernel[&Func],
-                    VariablesReachableThroughFunctionPointer);
-        }
-      }
-    }
-
-    // Verify that we fall into one of 2 cases:
-    //    - All variables are absolute: this is a re-run of the pass
-    //      so we don't have anything to do.
-    //    - No variables are absolute.
-    std::optional<bool> HasAbsoluteGVs;
-    for (auto &Map : {direct_map_kernel, indirect_map_kernel}) {
-      for (auto &[Fn, GVs] : Map) {
-        for (auto *GV : GVs) {
-          bool IsAbsolute = GV->isAbsoluteSymbolRef();
-          if (HasAbsoluteGVs.has_value()) {
-            if (*HasAbsoluteGVs != IsAbsolute) {
-              report_fatal_error(
-                  "Module cannot mix absolute and non-absolute LDS GVs");
-            }
-          } else
-            HasAbsoluteGVs = IsAbsolute;
-        }
-      }
-    }
-
-    // If we only had absolute GVs, we have nothing to do, return an empty
-    // result.
-    if (HasAbsoluteGVs && *HasAbsoluteGVs)
-      return {FunctionVariableMap(), FunctionVariableMap()};
-
-    return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
-  }
-
   struct LDSVariableReplacement {
     GlobalVariable *SGV = nullptr;
     DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 90f36fadf35903..eda4949d0296d5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -22,6 +22,7 @@ MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
             AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
 MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this))
+MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass())
 MODULE_PASS("amdgpu-printf-runtime-binding", AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
 #undef MODULE_PASS
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
new file mode 100644
index 00000000000000..ed3670fa1386d6
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
@@ -0,0 +1,865 @@
+//===-- AMDGPUSwLowerLDS.cpp -----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass lowers the local data store, LDS, uses in kernel and non-kernel
+// functions in module with dynamically allocated device global memory.
+//
+// Replacement of Kernel LDS accesses:
+//    For a kernel, LDS access can be static or dynamic which are direct
+//    (accessed within kernel) and indirect (accessed through non-kernels).
+//    A device global memory equal to size of all these LDS globals will be
+//    allocated. At the prologue of the kernel, a single work-item from the
+//    work-group, does a "malloc" and stores the pointer of the allocation in
+//    new LDS global that will be created for the kernel. This will be called
+//    "malloc LDS global" in this pass.
+//    Each LDS access corresponds to an offset in the allocated memory.
+//    All static LDS accesses will be allocated first and then dynamic LDS
+//    will occupy the device global memoery.
+//    To store the offsets corresponding to all LDS accesses, another global
+//    variable is created which will be called "metadata global" in this pass.
+//    - Malloc LDS Global:
+//        It is LDS global of ptr type with name
+//        "llvm.amdgcn.sw.lds.<kernel-name>".
+//    - Metadata Global:
+//        It is of struct type, with n members. n equals the number of LDS
+//        globals accessed by the kernel(direct and indirect). Each member of
+//        struct is another struct of type {i32, i32}. First member corresponds
+//        to offset, second member corresponds to size of LDS global being
+//        replaced. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md".
+//        This global will have an intializer with static LDS related offsets
+//        and sizes initialized. But for dynamic LDS related entries, offsets
+//        will be intialized to previous static LDS allocation end offset. Sizes
+//        for them will be zero initially. These dynamic LDS offset and size
+//        values will be updated with in the kernel, since kernel can read the
+//        dynamic LDS size allocation done at runtime with query to
+//        "hidden_dynamic_lds_size" hidden kernel argument.
+//
+//    LDS accesses within the kernel will be replaced by "gep" ptr to
+//    corresponding offset into allocated device global memory for the kernel.
+//    At the epilogue of kernel, allocated memory would be made free by the same
+//    single work-item.
+//
+// Replacement of non-kernel LDS accesses:
+//    Multiple kernels can access the same non-kernel function.
+//    All the kernels accessing LDS through non-kernels are sorted and
+//    assigned a kernel-id. All the LDS globals accessed by non-kernels
+//    are sorted. This information is used to build two tables:
+//    - Base table:
+//        Base table will have single row, with elements of the row
+//        placed as per kernel ID. Each element in the row corresponds
+//        to addresss of "malloc LDS global" variable created for
+//        that kernel.
+//    - Offset table:
+//        Offset table will have multiple rows and columns.
+//        Rows are assumed to be from 0 to (n-1). n is total number
+//        of kernels accessing the LDS through non-kernels.
+//        Each row will have m elements. m is the total number of
+//        unique LDS globals accessed by all non-kernels.
+//        Each element in the row correspond to the address of
+//        the replacement of LDS global done by that particular kernel.
+//    A LDS variable in non-kernel will be replaced based on the information
+//    from base and offset tables. Based on kernel-id query, address of "malloc
+//    LDS global" for that corresponding kernel is obtained from base table.
+//    The Offset into the base "malloc LDS global" is obtained from
+//    corresponding element in offset table. With this information, replacement
+//    value is obtained.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "Utils/AMDGPUMemoryUtils.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SetOperations.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/CallGraph.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/ReplaceConstant.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#include <algorithm>
+
+#define DEBUG_TYPE "amdgpu-sw-lower-lds"
+
+using namespace llvm;
+using namespace AMDGPU;
+
+namespace {
+
+using DomTreeCallback = function_ref<DominatorTree *(Function &F)>;
+
+struct LDSAccessTypeInfo {
+  SetVector<GlobalVariable *> StaticLDSGlobals;
+  SetVector<GlobalVariable *> DynamicLDSGlobals;
+};
+
+// Struct to hold all the Metadata required for a kernel
+// to replace a LDS global uses with corresponding offset
+// in to device global memory.
+struct KernelLDSParameters {
+  GlobalVariable *MallocLDSGlobal{nullptr};
+  GlobalVariable *MallocMetadataGlobal{nullptr};
+  LDSAccessTypeInfo DirectAccess;
+  LDSAccessTypeInfo IndirectAccess;
+  DenseMap<GlobalVariable *, SmallVector<uint32_t, 3>>
+      LDSToReplacementIndicesMap;
+  int32_t KernelId{-1};
+  uint32_t MallocSize{0};
+};
+
+// Struct to store infor for creation of offset table
+// for all the non-kernel LDS accesses.
+struct NonKernelLDSParameters {
+  GlobalVariable *LDSBaseTable{nullptr};
+  GlobalVariable *LDSOffsetTable{nullptr};
+  SetVector<Function *> OrderedKernels;
+  SetVector<GlobalVariable *> OrdereLDSGlobals;
+};
+
+class AMDGPUSwLowerLDS {
+public:
+  AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
+      : M(mod), IRB(M.getContext()), DTCallback(Callback) {}
+  bool Run();
+  void GetUsesOfLDSByNonKernels(CallGraph const &CG,
+                                FunctionVariableMap &functions);
+  SetVector<Function *>
+  GetOrderedIndirectLDSAccessingKernels(SetVector<Function *> &&Kernels);
+  SetVector<GlobalVariable *>
+  GetOrderedNonKernelAllLDSGlobals(SetVector<GlobalVariable *> &&Variables);
+  void PopulateMallocLDSGlobal(Function *Func);
+  void PopulateMallocMetadataGlobal(Function *Func);
+  void PopulateLDSToReplacementIndicesMap(Function *Func);
+  void ReplaceKernelLDSAccesses(Function *Func);
+  void LowerKernelLDSAccesses(Function *Func, DomTreeUpdater &DTU);
+  void BuildNonKernelLDSOffsetTable(
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+  void BuildNonKernelLDSBaseTable(
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+  Constant *
+  GetAddressesOfVariablesInKernel(Function *Func,
+                                  SetVector<GlobalVariable *> &Variables);
+  void LowerNonKernelLDSAccesses(
+      Function *Func, SetVector<GlobalVariable *> &LDSGlobals,
+      std::shared_ptr<NonKernelLDSParameters> &NKLDSParams);
+
+private:
+  Module &M;
+  IRBuilder<> IRB;
+  DomTreeCallback DTCallback;
+  DenseMap<Function *, std::shared_ptr<KernelLDSParameters>>
+      KernelToLDSParametersMap;
+};
+
+template <typename T> SetVector<T> SortByName(std::vector<T> &&V) {
+  // Sort the vector of globals or Functions based on their name.
+  // Returns a SetVector of globals/Functions.
+  llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
+    return L->getName() < R->getName();
+  });
+  return {std::move(SetVector<T>(V.begin(), V.end()))};
+}
+
+SetVector<GlobalVariable *> AMDGPUSwLowerLDS::GetOrderedNonKernelAllLDSGlobals(
+    SetVector<GlobalVariable *> &&Variables) {
+  // Sort all the non-kernel LDS accesses based on theor name.
+  SetVector<GlobalVariable *> Ordered = SortByName(
+      std::vector<GlobalVariable *>(Variables.begin(), Variables.end()));
+  return std::move(Ordered);
+}
+
+SetVector<Function *> AMDGPUSwLowerLDS::GetOrderedIndirectLDSAccessingKernels(
+    SetVector<Function *> &&Kernels) {
+  // Sort the non-kernels accessing LDS based on theor name.
+  // Also assign a kernel ID metadata based on the sorted order.
+  LLVMContext &Ctx = M.getContext();
+  if (Kernels.size() > UINT32_MAX) {
+    // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
+    report_fatal_error("Unimplemented SW LDS lowering for > 2**32 kernels");
+  }
+  SetVector<Function *> OrderedKernels =
+      SortByName(std::vector<Function *>(Kernels.begin(), Kernels.end()));
+  for (size_t i = 0; i < Kernels.size(); i++) {
+    Metadata *AttrMDArgs[1] = {
+        ConstantAsMetadata::get(IRB.getInt32(i)),
+    };
+    Function *Func = OrderedKernels[i];
+    Func->setMetadata("llvm.amdgcn.lds.kernel.id",
+                      MDNode::get(Ctx, AttrMDArgs));
+    auto &LDSParams = KernelToLDSParametersMap[Func];
+    assert(LDSParams);
+    LDSParams->KernelId = i;
+  }
+  return std::move(OrderedKernels);
+}
+
+void AMDGPUSwLowerLDS::GetUsesOfLDSByNonKernels(
+    CallGraph const &CG, FunctionVariableMap &functions) {
+  // Get uses from the current function, excluding uses by called functions
+  // Two output variables to avoid walking the globals list twice
+  for (auto &GV : M.globals()) {
+    if (!AMDGPU::isLDSVariableToLower(GV)) {
+      continue;
+    }
+
+    if (GV.isAbsoluteSymbolRef()) {
+      report_fatal_error(
+          "LDS variables with absolute addresses are unimplemented.");
+    }
+
+    for (User *V : GV.users()) {
+      User *FUU = V;
+      bool isCast = isa<BitCastOperator, AddrSpaceCastOperator>(FUU);
+      if (isCast && FUU->hasOneUse() && !FUU->user_begin()->user_empty())
+        FUU = *FUU->user_begin();
+      if (auto *I = dyn_cast<Instruction>(FUU)) {
+        Function *F = I->getFunction();
+        if (!isKernelLDS(F)) {
+          functions[F].insert(&GV);
+        }
+      }
+    }
+  }
+}
+
+void AMDGPUSwLowerLDS::PopulateMallocLDSGlobal(Function *Func) {
+  // Create new LDS global required for each kernel to store
+  // device global memory pointer.
+  auto &LDSParams = KernelToLDSParametersMap[Func];
+  assert(LDSParams);
+  // create new global pointer variable
+  LDSParams->MallocLDSGlobal = new GlobalVariable(
+      M, IRB.getPtrTy(), false, GlobalValue::InternalLinkage,
+      PoisonValue::get(IRB.getPtrTy()),
+      Twine("llvm.amdgcn.sw.lds." + F...
[truncated]

Comment on lines +167 to +228
// Sort the vector of globals or Functions based on their name.
// Returns a SetVector of globals/Functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name should be a tie-breaker only. Sort by alignment/size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdgpu-lower-module-lds pass also uses sorting of globals based on name. It is required to maintain consistent order of globals in offset table and while replacing the LDS globals with offsets into new LDS global.

Comment on lines +29 to +43
; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.amdgcn.workitem.id.z()
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP0]], [[TMP1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should try to strip the corresponding amdgpu-no-* attributes for introduced intrinsic calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added utility method from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils and removed amdgpu-no-workitem-id-* attributes from kernels which access LDS.

Copy link

github-actions bot commented Apr 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title is misleading. I think the implementation of the pass, and adding it to the pass pipeline should be done in separate changes

@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from 98d5f94 to e60eb97 Compare April 18, 2024 10:30
@skc7 skc7 changed the title [AMDGPU] Enable amdgpu-sw-lower-lds pass to lower LDS accesses to use device global memory [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses to use device global memory. Apr 18, 2024
Copy link
Contributor

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly coding style nits. The coding style here differs a bit from what we usually see so I pointed out the things that stood out to me as someone that's not in the loop with this change.


class AMDGPUSwLowerLDS {
public:
AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
AMDGPUSwLowerLDS(Module &Mod, DomTreeCallback Callback)

CamelCase

AMDGPUSwLowerLDS(Module &mod, DomTreeCallback Callback)
: M(mod), IRB(M.getContext()), DTCallback(Callback) {}
bool run();
void getUsesOfLDSByNonKernels(CallGraph const &CG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void getUsesOfLDSByNonKernels(CallGraph const &CG,
void getUsesOfLDSByNonKernels(const CallGraph &CG,

To be consistent with the codebase.

void getUsesOfLDSByNonKernels(CallGraph const &CG,
FunctionVariableMap &functions);
SetVector<Function *>
getOrderedIndirectLDSAccessingKernels(SetVector<Function *> &&Kernels);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document those functions, even if it's just a short comment.
It helps maintainability.

template <typename T> SetVector<T> sortByName(std::vector<T> &&V) {
// Sort the vector of globals or Functions based on their name.
// Returns a SetVector of globals/Functions.
llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm:: is not needed, I think.
I also think you can just do llvm::sort(V, ..) ?

set_union(transitive_map_function[&Func], direct_map_function[F]);

for (const CallGraphNode::CallRecord &R : *CG[F]) {
Function *ith = R.second->getFunction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CamelCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.


// direct_map_kernel lists which variables are used by the kernel
// find the variables which are used through a function call
FunctionVariableMap indirect_map_kernel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CamelCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.

continue;

for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
Function *ith = R.second->getFunction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CamelCase

Copy link
Contributor Author

@skc7 skc7 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pre-requisite PR #88002 which has AMDGPUmemoryUtils changes. Changes here will be removed once #88002 gets merged.

StringRef FnAttr) {
KernelRoot->removeFnAttr(FnAttr);

SmallVector<Function *> WorkList({CG[KernelRoot]->getFunction()});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= to assign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


#include <algorithm>

#define DEBUG_TYPE "amdgpu-sw-lower-lds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it possible to add some LLVM_DEBUG output to this pass?
It greatly helps debug eventual issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few debug outputs while replacing the LDS accesses. Thanks for suggestion.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase, hard to see over the moved code patch

//{StartOffset, AlignedSizeInBytes}
SmallString<128> MDItemStr;
raw_svector_ostream MDItemOS(MDItemStr);
MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.item";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.item";
MDItemOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.item";

Comment on lines 477 to 478
auto MallocSizeCalcLambda =
[&](SetVector<GlobalVariable *> &DynamicLDSGlobals) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a regular helper function?

Value *ImplicitArg =
IRB.CreateIntrinsic(Intrinsic::amdgcn_implicitarg_ptr, {}, {});
Value *HiddenDynLDSSize = IRB.CreateInBoundsGEP(
ImplicitArg->getType(), ImplicitArg, {IRB.getInt32(15)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand where the hardcoded 15 came from. There are various ConstInBoundsGEPs for this case too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should also use 64-bit indexes, this is canonically a 64-bit address space. Can we use an enum or something more structured to access the ABI location? I'm assuming this is assuming COV5?


auto *GEPForEndStaticLDSSize = IRB.CreateInBoundsGEP(
MetadataStructType, SwLDSMetadata,
{IRB.getInt32(0), IRB.getInt32(NumStaticLDS - 1), IRB.getInt32(2)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the Const* variants to hide all the getInt32s away

@@ -0,0 +1,58 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
; RUN: opt < %s -passes=amdgpu-sw-lower-lds -S -mtriple=amdgcn-- | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should specifically use amdhsa triples for these tests

skc7 added a commit to skc7/llvm-project that referenced this pull request May 9, 2024
@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from f01647b to 9d9e023 Compare May 9, 2024 11:25
/// Strip "amdgpu-no-lds-kernel-id" from any functions where we may have
/// introduced its use. If AMDGPUAttributor ran prior to the pass, we inferred
/// the lack of llvm.amdgcn.lds.kernel.id calls.
void removeNoLdsKernelIdFromReachable(CallGraph &CG, Function *KernelRoot) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rebased on main? This deletion should have already been merged when the code was moved to AMDGPUMemoryUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased and updated in latest commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#92686 PR raised to move remove this change.


SmallString<128> MDTypeStr;
raw_svector_ostream MDTypeOS(MDTypeStr);
MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.type";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md.type";
MDTypeOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md.type";

another one

StructType::create(Ctx, Items, MDTypeOS.str());
SmallString<128> MDStr;
raw_svector_ostream MDOS(MDStr);
MDOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MDOS << "llvm.amdgcn.sw.lds." << Func->getName().str() << ".md";
MDOS << "llvm.amdgcn.sw.lds." << Func->getName() << ".md";

Value *BasePlusOffset =
IRB.CreateInBoundsGEP(IRB.getInt8Ty(), SwLDS, {Load});
LLVM_DEBUG(dbgs() << "Sw LDS Lowering, Replacing LDS "
<< GV->getName().str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< GV->getName().str());
<< GV->getName());


ReplaceKernelLDSAccesses(Func);

auto *CondFreeBlock = BasicBlock::Create(Ctx, "CondFree", Func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the runtime has to manage cleanup of anything that happened in the kernel?

// Replace LDS access in non-kernel with replacement queried from
// Base table and offset from offset table.
LLVM_DEBUG(dbgs() << "Sw LDS lowering, lower non-kernel access for : "
<< Func->getName().str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< Func->getName().str());
<< Func->getName());

You should almost never need to convert to std::string

Value *BasePlusOffset =
IRB.CreateInBoundsGEP(IRB.getInt8Ty(), BasePtr, {OffsetLoad});
LLVM_DEBUG(dbgs() << "Sw LDS Lowering, Replace non-kernel LDS for "
<< GV->getName().str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< GV->getName().str());
<< GV->getName());

@@ -0,0 +1,100 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should use --check-globals since that's most of the point of the pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--check-globals cmd-line option is updating the tests with globals. But, some of the tests when run, are failing with missing ']' "closing bracket "like example below.. So have updated tests with globals check which don't complain this error.

@llvm.amdgcn.sw.lds.offset.table = internal addrspace(4) constant [2 x [4 x i32]] [[4 x i32] [i32 ptrtoint (ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md to i32), i32 poison, ..

skc7 added a commit to skc7/llvm-project that referenced this pull request May 10, 2024
@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from 9d9e023 to f2f4138 Compare May 10, 2024 17:28
@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from 2dc9064 to 6ad1fc2 Compare May 19, 2024 12:16
skc7 added a commit to skc7/llvm-project that referenced this pull request May 20, 2024
@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from 6ad1fc2 to 14cada9 Compare May 20, 2024 05:49
@skc7 skc7 changed the title [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses to use device global memory. [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. May 20, 2024
Comment on lines 954 to 956
removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-x");
removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-y");
removeFnAttrFromReachable(CG, Func, "amdgpu-no-workitem-id-z");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could all be removed in one CallGraph walk instead of 3 separate ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently removeFnAttrFromReachable accepts StringRef argument. Need to change it to accept array of stringref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #94188.

};
bool IsChanged = false;
AMDGPUSwLowerLDS SwLowerLDSImpl(M, DTCallback);
IsChanged |= SwLowerLDSImpl.run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just define isChanged here

skc7 added a commit to skc7/llvm-project that referenced this pull request Jun 7, 2024
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I trust the pass ordering with this strategy, and I think the pass name should not claim it's software lowering when it's not really that

for (auto &GV : LDSGlobals) {
if (is_contained(UniqueLDSGlobals, GV))
continue;
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need else after continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@b-sumner
Copy link

Introducing more LDS while also lowering it also introduces a range of possible pass order binding problems. You could just as well put the pointer in a regular addrspace(1) global

Each workgroup needs a different pointer, and a grid can have a lot of work groups. I don't think a global would work.

skc7 added a commit to skc7/llvm-project that referenced this pull request Aug 22, 2024
@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from d489ba1 to 8faffb9 Compare August 22, 2024 05:26
@arsenm
Copy link
Contributor

arsenm commented Aug 22, 2024

Each workgroup needs a different pointer, and a grid can have a lot of work groups. I don't think a global would work.

So why isn't the runtime responsible for setting up this pointer then? It does that effectively for LDS, the allocation is managed as part of the dispatch. That also goes back to my question of why we need to insert explicit free code, instead of just letting the runtime clean it up after as it would need to anyway

@b-sumner
Copy link

So why isn't the runtime responsible for setting up this pointer then? It does that effectively for LDS, the allocation is managed as part of the dispatch. That also goes back to my question of why we need to insert explicit free code, instead of just letting the runtime clean it up after as it would need to anyway

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

@arsenm
Copy link
Contributor

arsenm commented Aug 22, 2024

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

Presumably the implicit kernel arguments, and whatever is setting that up. It's essentially a partner to the queue pointer, which also is in the implicit kernargs

@b-sumner
Copy link

Which runtime are you talking about? The firmware or trap handler? And where are they going to place the pointer? Or are you even considering a new architected or reserved register to hold it and a new ABI?

Presumably the implicit kernel arguments, and whatever is setting that up. It's essentially a partner to the queue pointer, which also is in the implicit kernargs

OK. Suppose the launch has a million work groups. How much memory should the runtime allocate, and how will workgroup J decode what part of that memory to use? It can certainly be done but I'm wondering if we really need to do it now? And how much do we really need an independently working SW LDS?

@arsenm
Copy link
Contributor

arsenm commented Aug 22, 2024

OK. Suppose the launch has a million work groups. How much memory should the runtime allocate, and how will workgroup J decode what part of that memory to use?

The runtime is already bounded on how many groups it can dispatch at once; the allocation is tied to the dispatch size.

It can certainly be done but I'm wondering if we really need to do it now? And how much do we really need an independently working SW LDS?

I think having the trap door of pure software LDS would enable some useful experiments, such as not depending on any whole program visibility to lower function defined local variables. It also reduces the number of parts that need to directly interact in the compiler pipeline. With the current approach I foresee having to fix the same bugs twice in the module LDS lowering, and the asan version of module LDS lowering

@b-sumner
Copy link

b-sumner commented Aug 22, 2024

The runtime is already bounded on how many groups it can dispatch at once; the allocation is tied to the dispatch size.

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I think having the trap door of pure software LDS would enable some useful experiments, such as not depending on any whole program visibility to lower function defined local variables. It also reduces the number of parts that need to directly interact in the compiler pipeline. With the current approach I foresee having to fix the same bugs twice in the module LDS lowering, and the asan version of module LDS lowering

I don't disagree. But reading global memory for the pointer will be slower. The runtime launching one dispatch at a time to manage the memory will be slower, and we still need a kernel prolog and epilog for each workgroup to allocate and deallocate it's chunk of the global allocation, and I still don't know where we are going to store the per-workgroup workgroup-allocation-chunk-index or pointer.

@arsenm
Copy link
Contributor

arsenm commented Aug 23, 2024

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I thought it already had to do this if stack was enabled to avoid going over a device wide limit

@b-sumner
Copy link

The runtime doesn't split the dispatch into machine-sized chunks. If it does have a limit, then it is probably much larger than we want to allocate for.

I thought it already had to do this if stack was enabled to avoid going over a device wide limit

Yes, there is a special mode when scratch space is low but something like that would not be desirable to impose on every dispatch.

@skc7 skc7 force-pushed the skc7/sw_lower_lds branch from 8faffb9 to abfcc87 Compare August 25, 2024 05:06
@skc7 skc7 merged commit 7bc9d95 into llvm:main Aug 26, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux running on sanitizer-buildbot7 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/2930

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[4761/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[4762/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[4763/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o
[4764/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o
[4765/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
[4766/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o
[4767/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
[4768/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o
[4769/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o
[4770/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[4771/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o
[4772/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[4773/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[4774/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
[4775/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
[4776/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[4777/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4778/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o
[4779/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
[4780/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[4781/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o
[4782/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
[4783/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[4784/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[4785/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
[4786/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4787/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o
[4788/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o
[4789/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[4790/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o
[4791/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4792/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o
[4793/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[4794/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4795/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[4796/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[4797/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
[4798/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
[4799/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[4800/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
Step 8 (build compiler-rt symbolizer) failure: build compiler-rt symbolizer (failure)
...
[4761/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[4762/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[4763/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o
[4764/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o
[4765/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
[4766/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o
[4767/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
[4768/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o
[4769/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o
[4770/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[4771/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o
[4772/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[4773/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[4774/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
[4775/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
[4776/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[4777/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
[4778/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o
[4779/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
[4780/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[4781/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o
[4782/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
[4783/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[4784/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[4785/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
[4786/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[4787/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o
[4788/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o
[4789/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[4790/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o
[4791/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
[4792/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o
[4793/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[4794/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
[4795/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[4796/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[4797/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
[4798/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
[4799/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[4800/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[175/216] Linking CXX executable bin/sancov
[176/216] Linking CXX executable bin/llvm-ar
[177/216] Linking CXX executable bin/llvm-objdump
[178/216] Generating ../../bin/llvm-ranlib
[179/216] Linking CXX executable bin/llvm-nm
[180/216] Linking CXX executable bin/llvm-jitlink
[181/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
[182/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
[183/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
[184/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[185/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
[186/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[187/216] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 10 (build compiler-rt debug) failure: build compiler-rt debug (failure)
...
[5176/5252] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
[5177/5252] Linking CXX executable bin/llvm-objdump
[5178/5252] Generating ../../bin/llvm-otool
[5179/5252] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5180/5252] Linking CXX executable bin/llvm-nm
[5181/5252] Linking CXX executable bin/llvm-jitlink
[5182/5252] Linking CXX executable bin/llvm-profgen
[5183/5252] Linking CXX executable bin/llvm-rtdyld
[5184/5252] Linking CXX executable bin/sancov
[5185/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
@@@BUILD_STEP test compiler-rt debug@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 12 (build compiler-rt tsan_debug) failure: build compiler-rt tsan_debug (failure)
...
[5157/5233] Linking CXX executable bin/llvm-ml
[5158/5233] Linking CXX executable bin/llvm-nm
[5159/5233] Linking CXX executable bin/llvm-objdump
[5160/5233] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5161/5233] Linking CXX executable bin/llvm-jitlink
[5162/5233] Generating ../../bin/llvm-otool
[5163/5233] Linking CXX executable bin/llvm-rtdyld
[5164/5233] Linking CXX executable bin/llvm-profgen
[5165/5233] Linking CXX executable bin/sancov
[5166/5233] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 13 (build compiler-rt default) failure: build compiler-rt default (failure)
...
[5176/5252] Linking CXX executable bin/llvm-nm
[5177/5252] Linking CXX executable bin/llvm-objdump
[5178/5252] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CXExtractAPI.cpp.o
[5179/5252] Building CXX object tools/clang/tools/clang-repl/CMakeFiles/clang-repl.dir/ClangRepl.cpp.o
[5180/5252] Generating ../../bin/llvm-otool
[5181/5252] Linking CXX executable bin/llvm-jitlink
[5182/5252] Linking CXX executable bin/llvm-profgen
[5183/5252] Linking CXX executable bin/llvm-rtdyld
[5184/5252] Linking CXX executable bin/sancov
[5185/5252] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
@@@BUILD_STEP test compiler-rt default@@@
ninja: Entering directory `build_default'
[1/30] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 15 (build standalone compiler-rt) failure: build standalone compiler-rt (failure)
...
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.
Call Stack (most recent call first):
  CMakeLists.txt:12 (include)
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Didn't find assembler
CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_C_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  The CMAKE_CXX_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:17 (project):
  No CMAKE_ASM_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "ASM" or the CMake cache entry CMAKE_ASM_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
-- Warning: Did not find file Compiler/-ASM
-- Configuring incomplete, errors occurred!

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Step 16 (test standalone compiler-rt) failure: test standalone compiler-rt (failure)
@@@BUILD_STEP test standalone compiler-rt@@@
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/1725

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[3608/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[3609/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[3610/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[3611/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
[3612/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[3613/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[3614/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[3615/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
[3616/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[3617/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[3618/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[3619/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[3620/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[3621/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o
[3622/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
[3623/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[3624/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[3625/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[3626/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[3627/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o
[3628/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[3629/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[3630/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

@@@STEP_FAILURE@@@
Step 8 (bootstrap clang) failure: bootstrap clang (failure)
...
[3608/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
[3609/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
[3610/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[3611/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o
[3612/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
[3613/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
[3614/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
[3615/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
[3616/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[3617/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm_build64/include -I/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
[3618/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[3619/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
[3620/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
[3621/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o
[3622/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o
[3623/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[3624/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
[3625/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
[3626/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
[3627/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o
[3628/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
[3629/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o
[3630/5227] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
program finished with exit code 2
elapsedTime=229.699745

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building llvm at step 12 "build-stage2-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/2563

Here is the relevant piece of the build log for the reference
Step 12 (build-stage2-unified-tree) failure: build (failure)
...
359.310 [161/208/5859] Building CXX object tools/dsymutil/CMakeFiles/dsymutil.dir/DwarfLinkerForBinary.cpp.o
359.342 [161/207/5860] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/MachineDomTreeUpdaterTest.cpp.o
359.501 [161/206/5861] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/NewPMDriver.cpp.o
359.503 [161/205/5862] Building CXX object tools/sancov/CMakeFiles/sancov.dir/sancov.cpp.o
359.676 [161/204/5863] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
359.720 [161/203/5864] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/DriverUtils.cpp.o
359.792 [161/202/5865] Building CXX object tools/clang/unittests/CrossTU/CMakeFiles/CrossTUTests.dir/CrossTranslationUnitTest.cpp.o
360.084 [161/201/5866] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclAttr.cpp.o
360.266 [161/200/5867] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
360.926 [161/199/5868] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
ccache /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
360.958 [161/198/5869] Building CXX object tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/ASTReader.cpp.o
360.982 [161/197/5870] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
361.005 [161/196/5871] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO.dir/SyntheticSections.cpp.o
361.049 [161/195/5872] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
361.322 [161/194/5873] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMIRFormatter.cpp.o
361.651 [161/193/5874] Building CXX object tools/lld/wasm/CMakeFiles/lldWasm.dir/SymbolTable.cpp.o
361.729 [161/192/5875] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o
361.842 [161/191/5876] Building CXX object tools/llvm-lto/CMakeFiles/llvm-lto.dir/llvm-lto.cpp.o
362.061 [161/190/5877] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndexCodeCompletion.cpp.o
362.354 [161/189/5878] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/StandardLibraryTest.cpp.o
362.391 [161/188/5879] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
362.494 [161/187/5880] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/LexicalScopesTest.cpp.o
362.563 [161/186/5881] Building CXX object tools/llvm-lto2/CMakeFiles/llvm-lto2.dir/llvm-lto2.cpp.o
362.631 [161/185/5882] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
362.643 [161/184/5883] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
362.922 [161/183/5884] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
362.997 [161/182/5885] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/Indexing.cpp.o
363.012 [161/181/5886] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/MachineInstrTest.cpp.o
363.270 [161/180/5887] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/AsmPrinterDwarfTest.cpp.o
363.425 [161/179/5888] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
363.592 [161/178/5889] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
363.649 [161/177/5890] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
363.746 [161/176/5891] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/FixItTest.cpp.o
363.859 [161/175/5892] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/SymbolTable.cpp.o
364.248 [161/174/5893] Building CXX object tools/llvm-nm/CMakeFiles/llvm-nm.dir/llvm-nm.cpp.o
364.332 [161/173/5894] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
364.488 [161/172/5895] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/ProfileGenerator.cpp.o
364.580 [161/171/5896] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
364.648 [161/170/5897] Building CXX object unittests/Target/AMDGPU/CMakeFiles/AMDGPUTests.dir/ExecMayBeModifiedBeforeAnyUse.cpp.o
365.186 [161/169/5898] Building CXX object tools/lld/COFF/CMakeFiles/lldCOFF.dir/Writer.cpp.o

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 4 "build stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/2236

Here is the relevant piece of the build log for the reference
Step 4 (build stage 1) failure: 'ninja' (failure)
...
[5952/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[5953/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[5954/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In member function ‘bool llvm::AMDGPUInstructionSelector::selectG_TRUNC(llvm::MachineInstr&) const’:
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:2301: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
         DstSize < 32 ? AMDGPU::sub0 : TRI.getSubRegFromChannel(0, DstSize / 32);
 
[5955/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o
[5956/6117] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[5957/6117] Linking CXX shared library lib/libLLVMAMDGPUCodeGen.so.20.0git
FAILED: lib/libLLVMAMDGPUCodeGen.so.20.0git 
: && /usr/lib64/ccache/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMAMDGPUCodeGen.so.20.0git -o lib/libLLVMAMDGPUCodeGen.so.20.0git lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAliasAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateKernelFeatures.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAnnotateUniformValues.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsanInstrumentation.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAtomicOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCombinerHelper.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCtorDtorLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUExportClustering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUFrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUGlobalISelDivergenceLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUGlobalISelUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertDelayAlu.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibFunc.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelAttributes.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineFunction.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineModuleInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMacroFusion.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUIGroupLP.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir
Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMIRFormatter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUOpenCLEnqueuedBlockLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPrintfRuntimeBinding.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteKernelArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankCombiner.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegBankSelect.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURegisterBankInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURemoveIncompatibleFunctions.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteOutArguments.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPURewriteUndefForPHI.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSetWavePriority.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSplitModule.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSubtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetObjectFile.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetTransformInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUUnifyDivergentExitNodes.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUUnifyMetadata.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineCFGStructurizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNCreateVOPD.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNDPPCombine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNHazardRecognizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNILPSched.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNIterativeScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNMinRegStrategy.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRAOptimizations.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRewritePartialRegUses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSubtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNVOPDUtils.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600AsmPrinter.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ClauseMergePass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ControlFlowFinalizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600EmitClauseMarkers.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ExpandSpecialInstrs.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600FrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600InstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineFunctionInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MachineScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600MCInstLower.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600OpenCLImageTypeLoweringPass.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600OptimizeVectorRegisters.cpp.o lib/Target/AM
GPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600Packetizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600RegisterInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600Subtarget.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetMachine.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixSGPRCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixVGPRCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertHardClauses.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertWaitcnts.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInstrInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILoadStoreOptimizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerControlFlow.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerI1Copies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineFunctionInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineScheduler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMemoryLegalizer.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIModeRegister.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIModeRegisterDefaults.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMasking.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMaskingPreRA.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPeepholeSDWA.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPostRABundler.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreEmitPeephole.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIProgramInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIShrinkInstructions.cpp.o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib:"  lib/libLLVMAMDGPUDesc.so.20.0git  lib/libLLVMAMDGPUInfo.so.20.0git  lib/libLLVMAMDGPUUtils.so.20.0git  lib/libLLVMAsmPrinter.so.20.0git  lib/libLLVMGlobalISel.so.20.0git  lib/libLLVMMIRParser.so.20.0git  lib/libLLVMPasses.so.20.0git  lib/libLLVMSelectionDAG.so.20.0git  lib/libLLVMCodeGen.so.20.0git  lib/libLLVMCodeGenTypes.so.20.0git  lib/libLLVMHipStdPar.so.20.0git  lib/libLLVMIRPrinter.so.20.0git  lib/libLLVMTarget.so.20.0git  lib/libLLVMipo.so.20.0git  lib/libLLVMVectorize.so.20.0git  lib/libLLVMScalarOpts.so.20.0git  lib/libLLVMTransformUtils.so.20.0git  lib/libLLVMAnalysis.so.20.0git  lib/libLLVMMC.so.20.0git  lib/libLLVMCore.so.20.0git  lib/libLLVMBinaryFormat.so.20.0git  lib/libLLVMTargetParser.so.20.0git  lib/libLLVMSupport.so.20.0git  -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o: In function `(anonymous namespace)::AMDGPUSwLowerLDS::run()':
AMDGPUSwLowerLDS.cpp:(.text._ZN12_GLOBAL__N_116AMDGPUSwLowerLDS3runEv+0x164): undefined reference to `llvm::getAddressSanitizerParams(llvm::Triple const&, int, bool, unsigned long*, int*, bool*)'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-rhel running on ppc64le-clang-rhel-test while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/1462

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
139.820 [203/52/5998] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMemoryLegalizer.cpp.o
140.808 [203/51/5999] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPerfHintAnalysis.cpp.o
140.823 [203/50/6000] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
140.905 [203/49/6001] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPromoteAlloca.cpp.o
140.983 [203/48/6002] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
141.544 [203/47/6003] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
141.562 [203/46/6004] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFixSGPRCopies.cpp.o
141.931 [203/45/6005] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerControlFlow.cpp.o
141.935 [203/44/6006] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
141.942 [203/43/6007] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
ccache /home/docker/llvm-external-buildbots/clang.17.0.6/bin/clang++ --gcc-toolchain=/gcc-toolchain/usr -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
141.959 [203/42/6008] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
142.166 [203/41/6009] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNNSAReassign.cpp.o
142.236 [203/40/6010] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPreLegalizerCombiner.cpp.o
142.391 [203/39/6011] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAlwaysInlinePass.cpp.o
142.406 [203/38/6012] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULateCodeGenPrepare.cpp.o
142.697 [203/37/6013] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
142.701 [203/36/6014] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o
143.705 [203/35/6015] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstCombineIntrinsic.cpp.o
143.707 [203/34/6016] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
143.716 [203/33/6017] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAttributor.cpp.o
143.753 [203/32/6018] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
143.754 [203/31/6019] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUBaseInfo.cpp.o
143.873 [203/30/6020] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUArgumentUsageInfo.cpp.o
144.095 [203/29/6021] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCallLowering.cpp.o
144.556 [203/28/6022] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNPreRALongBranchReg.cpp.o
144.994 [203/27/6023] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetTransformInfo.cpp.o
145.250 [203/26/6024] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerWWMCopies.cpp.o
145.364 [203/25/6025] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCodeGenPrepare.cpp.o
145.404 [203/24/6026] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
145.462 [203/23/6027] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUResourceUsageAnalysis.cpp.o
145.605 [203/22/6028] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
146.084 [203/21/6029] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUPostLegalizerCombiner.cpp.o
146.423 [203/20/6030] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
146.519 [203/19/6031] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
146.662 [203/18/6032] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
146.729 [203/17/6033] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUHSAMetadataStreamer.cpp.o
147.205 [203/16/6034] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o
147.214 [203/15/6035] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUAsmPrinter.cpp.o
147.992 [203/14/6036] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
148.008 [203/13/6037] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o

skc7 added a commit that referenced this pull request Aug 26, 2024
Fixes linking error in llvm CI: 
"AMDGPUSwLowerLDS::run()':

AMDGPUSwLowerLDS.cpp:(.text._ZN12_GLOBAL__N_116AMDGPUSwLowerLDS3runEv+0x164):
undefined reference to `llvm::getAddressSanitizerParams(llvm::Triple
const&, int, bool, unsigned long*, int*, bool*)'"

#87265 amdgpu-sw-lower-lds pass uses getAddressSanitizerParams method
from AddressSanitizer pass. It misses linking of LLVMInstrumentation to
AMDGPUCodegen. This PR adds it.
@skc7
Copy link
Contributor Author

skc7 commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 4 "build stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/2236

Here is the relevant piece of the build log for the reference

Issue should be fixed by #106039

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 3 "clean-build-dir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/786

Here is the relevant piece of the build log for the reference
Step 3 (clean-build-dir) failure: Delete failed. (failure) (timed out)
Step 5 (build-unified-tree) failure: build (failure)
...
2079.828 [3212/10/1898] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUImageIntrinsicOptimizer.cpp.o
2083.397 [3211/10/1899] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULibCalls.cpp.o
2085.188 [3210/10/1900] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelAttributes.cpp.o
2085.439 [3209/10/1901] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerKernelArguments.cpp.o
2092.104 [3208/10/1902] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineFunction.cpp.o
2092.205 [3207/10/1903] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMachineModuleInfo.cpp.o
2094.678 [3206/10/1904] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
2095.493 [3205/10/1905] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
2096.553 [3204/10/1906] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMacroFusion.cpp.o
2096.914 [3203/10/1907] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o 
/usr/local/clang-17.0.2/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LARGE_FILE_API -D_XOPEN_SOURCE=700 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/lib/Target/AMDGPU -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/include -I/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/include -mcmodel=large -fPIC -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUSwLowerLDS.cpp.o -c /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
  260 |   return std::move(OrderedKernels);
      |          ^
/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here
  260 |   return std::move(OrderedKernels);
      |          ^~~~~~~~~~              ~
1 error generated.
2098.530 [3203/9/1908] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerBufferFatPointers.cpp.o
2102.674 [3203/8/1909] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULowerModuleLDSPass.cpp.o
2104.488 [3203/7/1910] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMemoryUtils.cpp.o
2107.327 [3203/6/1911] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInsertSingleUseVDST.cpp.o
2109.105 [3203/5/1912] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMCInstLower.cpp.o
2111.338 [3203/4/1913] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUMarkLastScratchLoad.cpp.o
2117.416 [3203/3/1914] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
2123.153 [3203/2/1915] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUIGroupLP.cpp.o
2134.051 [3203/1/1916] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
ninja: build stopped: subcommand failed.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 22, 2024
This change adds the utilities required to asan instrument memory
instructions. In "amdgpu-sw-lower-lds" pass llvm#87265, during lowering from
LDS to global memory, new instructions in global memory would be created
which need to be asan instrumented.

Change-Id: I17f0371cdc15ea7af6c4e2a325af6ad96a5bfb7b
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 22, 2024
…lvm#87265)

This PR introduces new pass "amdgpu-sw-lower-lds".

This pass lowers the local data store, LDS, uses in kernel and
non-kernel functions in module to use dynamically allocated global
memory. Packed LDS Layout is emulated in the global memory.
The lowered memory instructions from LDS to global memory are then
instrumented for address sanitizer, to catch addressing errors.
This pass only work when address sanitizer has been enabled and has
instrumented the IR. It identifies that IR has been instrumented using
"nosanitize_address" module flag.

For a kernel, LDS access can be static or dynamic which are direct
(accessed within kernel) and indirect (accessed through non-kernels).

**Replacement of Kernel LDS accesses:**
- All the LDS accesses corresponding to kernel will be packed together,
where all static LDS accesses will be allocated first and then dynamic
LDS follows. The total size with alignment is calculated. A new LDS
global will be created for the kernel called "SW LDS" and it will have
the attribute "amdgpu-lds-size" attached with value of the size
calculated. All the LDS accesses in the module will be replaced by GEP
with offset into the "Sw LDS".
- A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing
the dynamic LDS. This will be marked used by kernel and will have
MD_absolue_symbol metadata set to total static LDS size, Since dynamic
LDS allocation starts after all static LDS allocation.

- A device global memory equal to the total LDS size will be allocated.
At the prologue of the kernel, a single work-item from the work-group,
does a "malloc" and stores the pointer of the allocation in "SW LDS". To
store the offsets corresponding to all LDS accesses, another global
variable is created which will be called "SW LDS metadata" in this pass.

- **SW LDS:**
It is LDS global of ptr type with name
"llvm.amdgcn.sw.lds.<kernel-name>".

- **SW LDS Metadata:**
It is of struct type, with n members. n equals the number of LDS globals
accessed by the kernel(direct and indirect). Each member of struct is
another struct of type {i32, i32, i32}. First member corresponds to
offset, second member corresponds to size of LDS global being replaced
and third represents the total aligned size. It will have name
"llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an
intializer with static LDS related offsets and sizes initialized. But
for dynamic LDS related entries, offsets will be intialized to previous
static LDS allocation end offset. Sizes for them will be zero initially.
These dynamic LDS offset and size values will be updated with in the
kernel, since kernel can read the dynamic LDS size allocation done at
runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.

- At the epilogue of kernel, allocated memory would be made free by the
same single work-item.

**Replacement of non-kernel LDS accesses:**
- Multiple kernels can access the same non-kernel function. All the
kernels accessing LDS through non-kernels are sorted and assigned a
kernel-id. All the LDS globals accessed by non-kernels are sorted.

- This information is used to build two tables:
- **Base table:**
Base table will have single row, with elements of the row placed as per
kernel ID. Each element in the row corresponds to ptr of "SW LDS"
variable created for that kernel.

- **Offset table:**
Offset table will have multiple rows and columns. Rows are assumed to be
from 0 to (n-1). n is total number of kernels accessing the LDS through
non-kernels. Each row will have m elements. m is the total number of
unique LDS globals accessed by all non-kernels. Each element in the row
correspond to the ptr of the replacement of LDS global done by that
particular kernel.

- A LDS variable in non-kernel will be replaced based on the information
from base and offset tables. Based on kernel-id query, ptr of "SW LDS"
for that corresponding kernel is obtained from base table. The Offset
into the base "SW LDS" is obtained from corresponding element in offset
table. With this information, replacement value is obtained.

Change-Id: I047105a7585878f780a2e741d2116f3c48232e1f
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 25, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/145

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants