Skip to content

[FixIrreducible] Use CycleInfo instead of a custom SCC traversal #101386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

ssahasra
Copy link
Collaborator

@ssahasra ssahasra commented Jul 31, 2024

[FixIrreducible] Use CycleInfo instead of a custom SCC traversal

  1. CycleInfo efficiently locates all cycles in a single pass, while the SCC is
    repeated inside every natural loop.

  2. CycleInfo provides a hierarchy of irreducible cycles, and the new
    implementation transforms each cycle in this hierarchy separately instead of
    reducing an entire irreducible SCC in a single step. This reduces the number
    of control-flow paths that pass through the header of each newly created
    loop. This is evidenced by the reduced number of predecessors on the "guard"
    blocks in the lit tests, and fewer operands on the corresponding PHI nodes.

  3. When an entry of an irreducible cycle is the header of a child natural loop,
    the original implementation destroyed that loop. This is now preserved,
    since the incoming edges on non-header entries are not touched.

  4. In the new implementation, if an irreducible cycle is a superset of a natural
    loop with the same header, then that natural loop is destroyed and replaced
    by the newly created loop.

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2024

@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-backend-amdgpu

Author: Sameer Sahasrabuddhe (ssahasra)

Changes
  1. CycleInfo efficiently locates all cycles in a single pass, while the SCC is repeated inside every natural loop.

  2. CycleInfo provides a hierarchy of irreducible cycles, and the new implementation transforms each cycle in this hierarchy separately instead of reducing an entire irreducible SCC in a single step. This reduces the number of control-flow paths that pass through the header of each newly created loop. This is evidenced by the reduced number of predecessors on the "guard" blocks in the lit tests, and fewer operands on the corresponding PHI nodes.

  3. When an entry of an irreducible cycle is the header of a child natural loop, the original impleementation destroyed that loop. This is now preserved, since the incoming edges on non-header entries are not touched.


Patch is 63.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101386.diff

9 Files Affected:

  • (modified) llvm/include/llvm/ADT/GenericCycleInfo.h (+27-6)
  • (modified) llvm/lib/Transforms/Utils/FixIrreducible.cpp (+101-155)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+10-5)
  • (modified) llvm/test/Transforms/FixIrreducible/basic.ll (+44-39)
  • (modified) llvm/test/Transforms/FixIrreducible/bug45623.ll (+4-4)
  • (modified) llvm/test/Transforms/FixIrreducible/nested.ll (+38-35)
  • (modified) llvm/test/Transforms/FixIrreducible/switch.ll (+2-2)
  • (modified) llvm/test/Transforms/StructurizeCFG/workarounds/needs-fix-reducible.ll (+31-21)
  • (modified) llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll (+93-82)
diff --git a/llvm/include/llvm/ADT/GenericCycleInfo.h b/llvm/include/llvm/ADT/GenericCycleInfo.h
index b601fc9bae38a..e6d76d5163b1e 100644
--- a/llvm/include/llvm/ADT/GenericCycleInfo.h
+++ b/llvm/include/llvm/ADT/GenericCycleInfo.h
@@ -107,6 +107,12 @@ template <typename ContextT> class GenericCycle {
     return is_contained(Entries, Block);
   }
 
+  /// \brief Replace all entries with \p Block as single entry.
+  void setSingleEntry(BlockT *Block) {
+    Entries.clear();
+    Entries.push_back(Block);
+  }
+
   /// \brief Return whether \p Block is contained in the cycle.
   bool contains(const BlockT *Block) const { return Blocks.contains(Block); }
 
@@ -189,6 +195,21 @@ template <typename ContextT> class GenericCycle {
   //@{
   using const_entry_iterator =
       typename SmallVectorImpl<BlockT *>::const_iterator;
+  const_entry_iterator entry_begin() const {
+    return const_entry_iterator{Entries.begin()};
+  }
+  const_entry_iterator entry_end() const {
+    return const_entry_iterator{Entries.end()};
+  }
+
+  using const_reverse_entry_iterator =
+      typename SmallVectorImpl<BlockT *>::const_reverse_iterator;
+  const_reverse_entry_iterator entry_rbegin() const {
+    return const_reverse_entry_iterator{Entries.rbegin()};
+  }
+  const_reverse_entry_iterator entry_rend() const {
+    return const_reverse_entry_iterator{Entries.rend()};
+  }
 
   size_t getNumEntries() const { return Entries.size(); }
   iterator_range<const_entry_iterator> entries() const {
@@ -252,12 +273,6 @@ template <typename ContextT> class GenericCycleInfo {
   /// the subtree.
   void moveTopLevelCycleToNewParent(CycleT *NewParent, CycleT *Child);
 
-  /// Assumes that \p Cycle is the innermost cycle containing \p Block.
-  /// \p Block will be appended to \p Cycle and all of its parent cycles.
-  /// \p Block will be added to BlockMap with \p Cycle and
-  /// BlockMapTopLevel with \p Cycle's top level parent cycle.
-  void addBlockToCycle(BlockT *Block, CycleT *Cycle);
-
 public:
   GenericCycleInfo() = default;
   GenericCycleInfo(GenericCycleInfo &&) = default;
@@ -275,6 +290,12 @@ template <typename ContextT> class GenericCycleInfo {
   unsigned getCycleDepth(const BlockT *Block) const;
   CycleT *getTopLevelParentCycle(BlockT *Block);
 
+  /// Assumes that \p Cycle is the innermost cycle containing \p Block.
+  /// \p Block will be appended to \p Cycle and all of its parent cycles.
+  /// \p Block will be added to BlockMap with \p Cycle and
+  /// BlockMapTopLevel with \p Cycle's top level parent cycle.
+  void addBlockToCycle(BlockT *Block, CycleT *Cycle);
+
   /// Methods for debug and self-test.
   //@{
 #ifndef NDEBUG
diff --git a/llvm/lib/Transforms/Utils/FixIrreducible.cpp b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
index 11e24d0585be4..9cad689c50e90 100644
--- a/llvm/lib/Transforms/Utils/FixIrreducible.cpp
+++ b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
@@ -6,58 +6,55 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
-// with control-flow edges incident from outside the SCC.  This pass converts a
-// irreducible SCC into a natural loop by applying the following transformation:
+// To convert an irreducible cycle C to a natural loop L:
 //
-// 1. Collect the set of headers H of the SCC.
-// 2. Collect the set of predecessors P of these headers. These may be inside as
-//    well as outside the SCC.
-// 3. Create block N and redirect every edge from set P to set H through N.
+// 1. Add a new node N to C.
+// 2. Redirect all external incoming edges through N.
+// 3. Redirect all edges incident on header H through N.
 //
-// This converts the SCC into a natural loop with N as the header: N is the only
-// block with edges incident from outside the SCC, and all backedges in the SCC
-// are incident on N, i.e., for every backedge, the head now dominates the tail.
+// This is sufficient to ensure that:
 //
-// INPUT CFG: The blocks A and B form an irreducible loop with two headers.
+// a. Every closed path in C also exists in L, with the modification that any
+//    path passing through H now passes through N before reaching H.
+// b. Every external path incident on any entry of C is now incident on N and
+//    then redirected to the entry.
+//
+// Thus, L is a strongly connected component dominated by N, and hence L is a
+// natural loop with header N.
+//
+// INPUT CFG: The blocks H and B form an irreducible loop with two headers.
 //
 //                        Entry
 //                       /     \
 //                      v       v
-//                      A ----> B
+//                      H ----> B
 //                      ^      /|
 //                       `----' |
 //                              v
 //                             Exit
 //
-// OUTPUT CFG: Edges incident on A and B are now redirected through a
-// new block N, forming a natural loop consisting of N, A and B.
+// OUTPUT CFG:
 //
 //                        Entry
 //                          |
 //                          v
-//                    .---> N <---.
-//                   /     / \     \
-//                  |     /   \     |
-//                  \    v     v    /
-//                   `-- A     B --'
+//                          N <---.
+//                         / \     \
+//                        /   \     |
+//                       v     v    /
+//                       H --> B --'
 //                             |
 //                             v
 //                            Exit
 //
-// The transformation is applied to every maximal SCC that is not already
-// recognized as a loop. The pass operates on all maximal SCCs found in the
-// function body outside of any loop, as well as those found inside each loop,
-// including inside any newly created loops. This ensures that any SCC hidden
-// inside a maximal SCC is also transformed.
 //
 // The actual transformation is handled by function CreateControlFlowHub, which
 // takes a set of incoming blocks (the predecessors) and outgoing blocks (the
-// headers). The function also moves every PHINode in an outgoing block to the
+// entries). The function also moves every PHINode in an outgoing block to the
 // hub. Since the hub dominates all the outgoing blocks, each such PHINode
-// continues to dominate its uses. Since every header in an SCC has at least two
-// predecessors, every value used in the header (or later) but defined in a
-// predecessor (or earlier) is represented by a PHINode in a header. Hence the
+// continues to dominate its uses. Since every entry the cycle has at least two
+// predecessors, every value used in the entry (or later) but defined in a
+// predecessor (or earlier) is represented by a PHINode in a entry. Hence the
 // above handling of PHINodes is sufficient and no further processing is
 // required to restore SSA.
 //
@@ -68,8 +65,9 @@
 
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/ADT/SCCIterator.h"
+#include "llvm/Analysis/CycleAnalysis.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
-#include "llvm/Analysis/LoopIterator.h"
+#include "llvm/Analysis/LoopInfo.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
 #include "llvm/Transforms/Utils.h"
@@ -88,8 +86,9 @@ struct FixIrreducible : public FunctionPass {
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<DominatorTreeWrapperPass>();
-    AU.addRequired<LoopInfoWrapperPass>();
+    AU.addRequired<CycleInfoWrapperPass>();
     AU.addPreserved<DominatorTreeWrapperPass>();
+    AU.addPreserved<CycleInfoWrapperPass>();
     AU.addPreserved<LoopInfoWrapperPass>();
   }
 
@@ -113,27 +112,23 @@ INITIALIZE_PASS_END(FixIrreducible, "fix-irreducible",
 // When a new loop is created, existing children of the parent loop may now be
 // fully inside the new loop. Reconnect these as children of the new loop.
 static void reconnectChildLoops(LoopInfo &LI, Loop *ParentLoop, Loop *NewLoop,
-                                SetVector<BasicBlock *> &Blocks,
-                                SetVector<BasicBlock *> &Headers) {
+                                BasicBlock *OldHeader) {
   auto &CandidateLoops = ParentLoop ? ParentLoop->getSubLoopsVector()
                                     : LI.getTopLevelLoopsVector();
-  // The new loop cannot be its own child, and any candidate is a
-  // child iff its header is owned by the new loop. Move all the
-  // children to a new vector.
+  // Any candidate is a child iff its header is owned by the new loop. Move all
+  // the children to a new vector.
   auto FirstChild = std::partition(
-      CandidateLoops.begin(), CandidateLoops.end(), [&](Loop *L) {
-        return L == NewLoop || !Blocks.contains(L->getHeader());
-      });
+      CandidateLoops.begin(), CandidateLoops.end(),
+      [&](Loop *L) { return !NewLoop->contains(L->getHeader()); });
   SmallVector<Loop *, 8> ChildLoops(FirstChild, CandidateLoops.end());
   CandidateLoops.erase(FirstChild, CandidateLoops.end());
 
   for (Loop *Child : ChildLoops) {
     LLVM_DEBUG(dbgs() << "child loop: " << Child->getHeader()->getName()
                       << "\n");
-    // TODO: A child loop whose header is also a header in the current
-    // SCC gets destroyed since its backedges are removed. That may
-    // not be necessary if we can retain such backedges.
-    if (Headers.count(Child->getHeader())) {
+    // A child loop whose header was the old cycle header gets destroyed since
+    // its backedges are removed.
+    if (Child->getHeader() == OldHeader) {
       for (auto *BB : Child->blocks()) {
         if (LI.getLoopFor(BB) != Child)
           continue;
@@ -161,21 +156,25 @@ static void reconnectChildLoops(LoopInfo &LI, Loop *ParentLoop, Loop *NewLoop,
 // Given a set of blocks and headers in an irreducible SCC, convert it into a
 // natural loop. Also insert this new loop at its appropriate place in the
 // hierarchy of loops.
-static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
-                                      Loop *ParentLoop,
-                                      SetVector<BasicBlock *> &Blocks,
-                                      SetVector<BasicBlock *> &Headers) {
-#ifndef NDEBUG
-  // All headers are part of the SCC
-  for (auto *H : Headers) {
-    assert(Blocks.count(H));
-  }
-#endif
+static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
+                           LoopInfo *LI) {
+  if (C.isReducible())
+    return false;
 
   SetVector<BasicBlock *> Predecessors;
-  for (auto *H : Headers) {
-    for (auto *P : predecessors(H)) {
+
+  // Redirect internal edges incident on the header.
+  BasicBlock *OldHeader = C.getHeader();
+  for (BasicBlock *P : predecessors(OldHeader)) {
+    if (C.contains(P))
       Predecessors.insert(P);
+  }
+
+  // Redirect external incoming edges. This includes the edges on the header.
+  for (BasicBlock *E : C.entries()) {
+    for (BasicBlock *P : predecessors(E)) {
+      if (!C.contains(P))
+        Predecessors.insert(P);
     }
   }
 
@@ -189,21 +188,40 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   // Redirect all the backedges through a "hub" consisting of a series
   // of guard blocks that manage the flow of control from the
   // predecessors to the headers.
-  SmallVector<BasicBlock *, 8> GuardBlocks;
+  SmallVector<BasicBlock *> GuardBlocks;
+
+  // Minor optimization: The cycle entries are discovered in an order that is
+  // the opposite of the order in which these blocks appear as branch targets.
+  // This results in a lot of condition inversions in the control flow out of
+  // the new ControlFlowHub, which can be mitigated if the orders match. So we
+  // reverse the entries when adding them to the hub.
+  SetVector<BasicBlock *> Entries;
+  Entries.insert(C.entry_rbegin(), C.entry_rend());
+
   DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
-  CreateControlFlowHub(&DTU, GuardBlocks, Predecessors, Headers, "irr");
+  CreateControlFlowHub(&DTU, GuardBlocks, Predecessors, Entries, "irr");
 #if defined(EXPENSIVE_CHECKS)
   assert(DT.verify(DominatorTree::VerificationLevel::Full));
 #else
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
 #endif
 
+  for (auto *G : GuardBlocks) {
+    LLVM_DEBUG(dbgs() << "added guard block: " << G->getName() << "\n");
+    CI.addBlockToCycle(G, &C);
+  }
+  C.setSingleEntry(GuardBlocks[0]);
+
+  if (!LI)
+    return true;
+
+  Loop *ParentLoop = LI->getLoopFor(OldHeader);
   // Create a new loop from the now-transformed cycle
-  auto NewLoop = LI.AllocateLoop();
+  auto *NewLoop = LI->AllocateLoop();
   if (ParentLoop) {
     ParentLoop->addChildLoop(NewLoop);
   } else {
-    LI.addTopLevelLoop(NewLoop);
+    LI->addTopLevelLoop(NewLoop);
   }
 
   // Add the guard blocks to the new loop. The first guard block is
@@ -213,16 +231,15 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   // are also propagated up the chain of parent loops.
   for (auto *G : GuardBlocks) {
     LLVM_DEBUG(dbgs() << "added guard block: " << G->getName() << "\n");
-    NewLoop->addBasicBlockToLoop(G, LI);
+    NewLoop->addBasicBlockToLoop(G, *LI);
   }
 
-  // Add the SCC blocks to the new loop.
-  for (auto *BB : Blocks) {
+  for (auto *BB : C.blocks()) {
     NewLoop->addBlockEntry(BB);
-    if (LI.getLoopFor(BB) == ParentLoop) {
+    if (LI->getLoopFor(BB) == ParentLoop) {
       LLVM_DEBUG(dbgs() << "moved block from parent: " << BB->getName()
                         << "\n");
-      LI.changeLoopFor(BB, NewLoop);
+      LI->changeLoopFor(BB, NewLoop);
     } else {
       LLVM_DEBUG(dbgs() << "added block from child: " << BB->getName() << "\n");
     }
@@ -230,129 +247,58 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   LLVM_DEBUG(dbgs() << "header for new loop: "
                     << NewLoop->getHeader()->getName() << "\n");
 
-  reconnectChildLoops(LI, ParentLoop, NewLoop, Blocks, Headers);
+  reconnectChildLoops(*LI, ParentLoop, NewLoop, OldHeader);
 
   NewLoop->verifyLoop();
   if (ParentLoop) {
     ParentLoop->verifyLoop();
   }
 #if defined(EXPENSIVE_CHECKS)
-  LI.verify(DT);
+  LI->verify(DT);
 #endif // EXPENSIVE_CHECKS
-}
-
-namespace llvm {
-// Enable the graph traits required for traversing a Loop body.
-template <> struct GraphTraits<Loop> : LoopBodyTraits {};
-} // namespace llvm
 
-// Overloaded wrappers to go with the function template below.
-static BasicBlock *unwrapBlock(BasicBlock *B) { return B; }
-static BasicBlock *unwrapBlock(LoopBodyTraits::NodeRef &N) { return N.second; }
-
-static void createNaturalLoop(LoopInfo &LI, DominatorTree &DT, Function *F,
-                              SetVector<BasicBlock *> &Blocks,
-                              SetVector<BasicBlock *> &Headers) {
-  createNaturalLoopInternal(LI, DT, nullptr, Blocks, Headers);
+  return true;
 }
 
-static void createNaturalLoop(LoopInfo &LI, DominatorTree &DT, Loop &L,
-                              SetVector<BasicBlock *> &Blocks,
-                              SetVector<BasicBlock *> &Headers) {
-  createNaturalLoopInternal(LI, DT, &L, Blocks, Headers);
-}
-
-// Convert irreducible SCCs; Graph G may be a Function* or a Loop&.
-template <class Graph>
-static bool makeReducible(LoopInfo &LI, DominatorTree &DT, Graph &&G) {
-  bool Changed = false;
-  for (auto Scc = scc_begin(G); !Scc.isAtEnd(); ++Scc) {
-    if (Scc->size() < 2)
-      continue;
-    SetVector<BasicBlock *> Blocks;
-    LLVM_DEBUG(dbgs() << "Found SCC:");
-    for (auto N : *Scc) {
-      auto BB = unwrapBlock(N);
-      LLVM_DEBUG(dbgs() << " " << BB->getName());
-      Blocks.insert(BB);
-    }
-    LLVM_DEBUG(dbgs() << "\n");
-
-    // Minor optimization: The SCC blocks are usually discovered in an order
-    // that is the opposite of the order in which these blocks appear as branch
-    // targets. This results in a lot of condition inversions in the control
-    // flow out of the new ControlFlowHub, which can be mitigated if the orders
-    // match. So we discover the headers using the reverse of the block order.
-    SetVector<BasicBlock *> Headers;
-    LLVM_DEBUG(dbgs() << "Found headers:");
-    for (auto *BB : reverse(Blocks)) {
-      for (const auto P : predecessors(BB)) {
-        // Skip unreachable predecessors.
-        if (!DT.isReachableFromEntry(P))
-          continue;
-        if (!Blocks.count(P)) {
-          LLVM_DEBUG(dbgs() << " " << BB->getName());
-          Headers.insert(BB);
-          break;
-        }
-      }
-    }
-    LLVM_DEBUG(dbgs() << "\n");
-
-    if (Headers.size() == 1) {
-      assert(LI.isLoopHeader(Headers.front()));
-      LLVM_DEBUG(dbgs() << "Natural loop with a single header: skipped\n");
-      continue;
-    }
-    createNaturalLoop(LI, DT, G, Blocks, Headers);
-    Changed = true;
-  }
-  return Changed;
-}
-
-static bool FixIrreducibleImpl(Function &F, LoopInfo &LI, DominatorTree &DT) {
+static bool FixIrreducibleImpl(Function &F, CycleInfo &CI, DominatorTree &DT,
+                               LoopInfo *LI) {
   LLVM_DEBUG(dbgs() << "===== Fix irreducible control-flow in function: "
                     << F.getName() << "\n");
 
   assert(hasOnlySimpleTerminator(F) && "Unsupported block terminator.");
 
   bool Changed = false;
-  SmallVector<Loop *, 8> WorkList;
-
-  LLVM_DEBUG(dbgs() << "visiting top-level\n");
-  Changed |= makeReducible(LI, DT, &F);
-
-  // Any SCCs reduced are now already in the list of top-level loops, so simply
-  // add them all to the worklist.
-  append_range(WorkList, LI);
-
-  while (!WorkList.empty()) {
-    auto L = WorkList.pop_back_val();
-    LLVM_DEBUG(dbgs() << "visiting loop with header "
-                      << L->getHeader()->getName() << "\n");
-    Changed |= makeReducible(LI, DT, *L);
-    // Any SCCs reduced are now already in the list of child loops, so simply
-    // add them all to the worklist.
-    WorkList.append(L->begin(), L->end());
+  SmallVector<Cycle *> Worklist{CI.toplevel_cycles()};
+
+  while (!Worklist.empty()) {
+    Cycle *C = Worklist.pop_back_val();
+    Changed |= fixIrreducible(*C, CI, DT, LI);
+    append_range(Worklist, C->children());
   }
 
   return Changed;
 }
 
 bool FixIrreducible::runOnFunction(Function &F) {
-  auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
+  auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
+  LoopInfo *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;
+  auto &CI = getAnalysis<CycleInfoWrapperPass>().getResult();
   auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
-  return FixIrreducibleImpl(F, LI, DT);
+  return FixIrreducibleImpl(F, CI, DT, LI);
 }
 
 PreservedAnalyses FixIrreduciblePass::run(Function &F,
                                           FunctionAnalysisManager &AM) {
-  auto &LI = AM.getResult<LoopAnalysis>(F);
+  auto *LI = AM.getCachedResult<LoopAnalysis>(F);
+  auto &CI = AM.getResult<CycleAnalysis>(F);
   auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
-  if (!FixIrreducibleImpl(F, LI, DT))
+
+  if (!FixIrreducibleImpl(F, CI, DT, LI))
     return PreservedAnalyses::all();
+
   PreservedAnalyses PA;
   PA.preserve<LoopAnalysis>();
+  PA.preserve<CycleAnalysis>();
   PA.preserve<DominatorTreeAnalysis>();
   return PA;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index b61838c06a1f9..62b42c892a11e 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -68,8 +68,9 @@
 ; GCN-O0-NEXT:        Uniformity Analysis
 ; GCN-O0-NEXT:        Unify divergent function exit nodes
 ; GCN-O0-NEXT:        Dominator Tree Construction
-; GCN-O0-NEXT:        Natural Loop Information
+; GCN-O0-NEXT:        Cycle Info Analysis
 ; GCN-O0-NEXT:        Convert irreducible control-flow into natural loops
+; GCN-O0-NEXT:        Natural Loop Information
 ; GCN-O0-NEXT:        Fixup each natural loop to have a single exit block
 ; GCN-O0-NEXT:        Post-Dominator Tree Construction
 ; GCN-O0-NEXT:        Dominance Frontier Construction
@@ -262,8 +263,9 @@
 ; GCN-O1-NEXT:        Post-Dominator Tree Construction
 ; GCN-O1-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-NEXT:        Dominator Tree Construction
-; GCN-O1-NEXT:        Natural Loop Information
+; GCN-O1-NEXT:        Cycle Info Analysis
 ; GCN-O1-NEXT:...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Sameer Sahasrabuddhe (ssahasra)

Changes
  1. CycleInfo efficiently locates all cycles in a single pass, while the SCC is repeated inside every natural loop.

  2. CycleInfo provides a hierarchy of irreducible cycles, and the new implementation transforms each cycle in this hierarchy separately instead of reducing an entire irreducible SCC in a single step. This reduces the number of control-flow paths that pass through the header of each newly created loop. This is evidenced by the reduced number of predecessors on the "guard" blocks in the lit tests, and fewer operands on the corresponding PHI nodes.

  3. When an entry of an irreducible cycle is the header of a child natural loop, the original impleementation destroyed that loop. This is now preserved, since the incoming edges on non-header entries are not touched.


Patch is 63.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101386.diff

9 Files Affected:

  • (modified) llvm/include/llvm/ADT/GenericCycleInfo.h (+27-6)
  • (modified) llvm/lib/Transforms/Utils/FixIrreducible.cpp (+101-155)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+10-5)
  • (modified) llvm/test/Transforms/FixIrreducible/basic.ll (+44-39)
  • (modified) llvm/test/Transforms/FixIrreducible/bug45623.ll (+4-4)
  • (modified) llvm/test/Transforms/FixIrreducible/nested.ll (+38-35)
  • (modified) llvm/test/Transforms/FixIrreducible/switch.ll (+2-2)
  • (modified) llvm/test/Transforms/StructurizeCFG/workarounds/needs-fix-reducible.ll (+31-21)
  • (modified) llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll (+93-82)
diff --git a/llvm/include/llvm/ADT/GenericCycleInfo.h b/llvm/include/llvm/ADT/GenericCycleInfo.h
index b601fc9bae38a..e6d76d5163b1e 100644
--- a/llvm/include/llvm/ADT/GenericCycleInfo.h
+++ b/llvm/include/llvm/ADT/GenericCycleInfo.h
@@ -107,6 +107,12 @@ template <typename ContextT> class GenericCycle {
     return is_contained(Entries, Block);
   }
 
+  /// \brief Replace all entries with \p Block as single entry.
+  void setSingleEntry(BlockT *Block) {
+    Entries.clear();
+    Entries.push_back(Block);
+  }
+
   /// \brief Return whether \p Block is contained in the cycle.
   bool contains(const BlockT *Block) const { return Blocks.contains(Block); }
 
@@ -189,6 +195,21 @@ template <typename ContextT> class GenericCycle {
   //@{
   using const_entry_iterator =
       typename SmallVectorImpl<BlockT *>::const_iterator;
+  const_entry_iterator entry_begin() const {
+    return const_entry_iterator{Entries.begin()};
+  }
+  const_entry_iterator entry_end() const {
+    return const_entry_iterator{Entries.end()};
+  }
+
+  using const_reverse_entry_iterator =
+      typename SmallVectorImpl<BlockT *>::const_reverse_iterator;
+  const_reverse_entry_iterator entry_rbegin() const {
+    return const_reverse_entry_iterator{Entries.rbegin()};
+  }
+  const_reverse_entry_iterator entry_rend() const {
+    return const_reverse_entry_iterator{Entries.rend()};
+  }
 
   size_t getNumEntries() const { return Entries.size(); }
   iterator_range<const_entry_iterator> entries() const {
@@ -252,12 +273,6 @@ template <typename ContextT> class GenericCycleInfo {
   /// the subtree.
   void moveTopLevelCycleToNewParent(CycleT *NewParent, CycleT *Child);
 
-  /// Assumes that \p Cycle is the innermost cycle containing \p Block.
-  /// \p Block will be appended to \p Cycle and all of its parent cycles.
-  /// \p Block will be added to BlockMap with \p Cycle and
-  /// BlockMapTopLevel with \p Cycle's top level parent cycle.
-  void addBlockToCycle(BlockT *Block, CycleT *Cycle);
-
 public:
   GenericCycleInfo() = default;
   GenericCycleInfo(GenericCycleInfo &&) = default;
@@ -275,6 +290,12 @@ template <typename ContextT> class GenericCycleInfo {
   unsigned getCycleDepth(const BlockT *Block) const;
   CycleT *getTopLevelParentCycle(BlockT *Block);
 
+  /// Assumes that \p Cycle is the innermost cycle containing \p Block.
+  /// \p Block will be appended to \p Cycle and all of its parent cycles.
+  /// \p Block will be added to BlockMap with \p Cycle and
+  /// BlockMapTopLevel with \p Cycle's top level parent cycle.
+  void addBlockToCycle(BlockT *Block, CycleT *Cycle);
+
   /// Methods for debug and self-test.
   //@{
 #ifndef NDEBUG
diff --git a/llvm/lib/Transforms/Utils/FixIrreducible.cpp b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
index 11e24d0585be4..9cad689c50e90 100644
--- a/llvm/lib/Transforms/Utils/FixIrreducible.cpp
+++ b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
@@ -6,58 +6,55 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
-// with control-flow edges incident from outside the SCC.  This pass converts a
-// irreducible SCC into a natural loop by applying the following transformation:
+// To convert an irreducible cycle C to a natural loop L:
 //
-// 1. Collect the set of headers H of the SCC.
-// 2. Collect the set of predecessors P of these headers. These may be inside as
-//    well as outside the SCC.
-// 3. Create block N and redirect every edge from set P to set H through N.
+// 1. Add a new node N to C.
+// 2. Redirect all external incoming edges through N.
+// 3. Redirect all edges incident on header H through N.
 //
-// This converts the SCC into a natural loop with N as the header: N is the only
-// block with edges incident from outside the SCC, and all backedges in the SCC
-// are incident on N, i.e., for every backedge, the head now dominates the tail.
+// This is sufficient to ensure that:
 //
-// INPUT CFG: The blocks A and B form an irreducible loop with two headers.
+// a. Every closed path in C also exists in L, with the modification that any
+//    path passing through H now passes through N before reaching H.
+// b. Every external path incident on any entry of C is now incident on N and
+//    then redirected to the entry.
+//
+// Thus, L is a strongly connected component dominated by N, and hence L is a
+// natural loop with header N.
+//
+// INPUT CFG: The blocks H and B form an irreducible loop with two headers.
 //
 //                        Entry
 //                       /     \
 //                      v       v
-//                      A ----> B
+//                      H ----> B
 //                      ^      /|
 //                       `----' |
 //                              v
 //                             Exit
 //
-// OUTPUT CFG: Edges incident on A and B are now redirected through a
-// new block N, forming a natural loop consisting of N, A and B.
+// OUTPUT CFG:
 //
 //                        Entry
 //                          |
 //                          v
-//                    .---> N <---.
-//                   /     / \     \
-//                  |     /   \     |
-//                  \    v     v    /
-//                   `-- A     B --'
+//                          N <---.
+//                         / \     \
+//                        /   \     |
+//                       v     v    /
+//                       H --> B --'
 //                             |
 //                             v
 //                            Exit
 //
-// The transformation is applied to every maximal SCC that is not already
-// recognized as a loop. The pass operates on all maximal SCCs found in the
-// function body outside of any loop, as well as those found inside each loop,
-// including inside any newly created loops. This ensures that any SCC hidden
-// inside a maximal SCC is also transformed.
 //
 // The actual transformation is handled by function CreateControlFlowHub, which
 // takes a set of incoming blocks (the predecessors) and outgoing blocks (the
-// headers). The function also moves every PHINode in an outgoing block to the
+// entries). The function also moves every PHINode in an outgoing block to the
 // hub. Since the hub dominates all the outgoing blocks, each such PHINode
-// continues to dominate its uses. Since every header in an SCC has at least two
-// predecessors, every value used in the header (or later) but defined in a
-// predecessor (or earlier) is represented by a PHINode in a header. Hence the
+// continues to dominate its uses. Since every entry the cycle has at least two
+// predecessors, every value used in the entry (or later) but defined in a
+// predecessor (or earlier) is represented by a PHINode in a entry. Hence the
 // above handling of PHINodes is sufficient and no further processing is
 // required to restore SSA.
 //
@@ -68,8 +65,9 @@
 
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/ADT/SCCIterator.h"
+#include "llvm/Analysis/CycleAnalysis.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
-#include "llvm/Analysis/LoopIterator.h"
+#include "llvm/Analysis/LoopInfo.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
 #include "llvm/Transforms/Utils.h"
@@ -88,8 +86,9 @@ struct FixIrreducible : public FunctionPass {
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<DominatorTreeWrapperPass>();
-    AU.addRequired<LoopInfoWrapperPass>();
+    AU.addRequired<CycleInfoWrapperPass>();
     AU.addPreserved<DominatorTreeWrapperPass>();
+    AU.addPreserved<CycleInfoWrapperPass>();
     AU.addPreserved<LoopInfoWrapperPass>();
   }
 
@@ -113,27 +112,23 @@ INITIALIZE_PASS_END(FixIrreducible, "fix-irreducible",
 // When a new loop is created, existing children of the parent loop may now be
 // fully inside the new loop. Reconnect these as children of the new loop.
 static void reconnectChildLoops(LoopInfo &LI, Loop *ParentLoop, Loop *NewLoop,
-                                SetVector<BasicBlock *> &Blocks,
-                                SetVector<BasicBlock *> &Headers) {
+                                BasicBlock *OldHeader) {
   auto &CandidateLoops = ParentLoop ? ParentLoop->getSubLoopsVector()
                                     : LI.getTopLevelLoopsVector();
-  // The new loop cannot be its own child, and any candidate is a
-  // child iff its header is owned by the new loop. Move all the
-  // children to a new vector.
+  // Any candidate is a child iff its header is owned by the new loop. Move all
+  // the children to a new vector.
   auto FirstChild = std::partition(
-      CandidateLoops.begin(), CandidateLoops.end(), [&](Loop *L) {
-        return L == NewLoop || !Blocks.contains(L->getHeader());
-      });
+      CandidateLoops.begin(), CandidateLoops.end(),
+      [&](Loop *L) { return !NewLoop->contains(L->getHeader()); });
   SmallVector<Loop *, 8> ChildLoops(FirstChild, CandidateLoops.end());
   CandidateLoops.erase(FirstChild, CandidateLoops.end());
 
   for (Loop *Child : ChildLoops) {
     LLVM_DEBUG(dbgs() << "child loop: " << Child->getHeader()->getName()
                       << "\n");
-    // TODO: A child loop whose header is also a header in the current
-    // SCC gets destroyed since its backedges are removed. That may
-    // not be necessary if we can retain such backedges.
-    if (Headers.count(Child->getHeader())) {
+    // A child loop whose header was the old cycle header gets destroyed since
+    // its backedges are removed.
+    if (Child->getHeader() == OldHeader) {
       for (auto *BB : Child->blocks()) {
         if (LI.getLoopFor(BB) != Child)
           continue;
@@ -161,21 +156,25 @@ static void reconnectChildLoops(LoopInfo &LI, Loop *ParentLoop, Loop *NewLoop,
 // Given a set of blocks and headers in an irreducible SCC, convert it into a
 // natural loop. Also insert this new loop at its appropriate place in the
 // hierarchy of loops.
-static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
-                                      Loop *ParentLoop,
-                                      SetVector<BasicBlock *> &Blocks,
-                                      SetVector<BasicBlock *> &Headers) {
-#ifndef NDEBUG
-  // All headers are part of the SCC
-  for (auto *H : Headers) {
-    assert(Blocks.count(H));
-  }
-#endif
+static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
+                           LoopInfo *LI) {
+  if (C.isReducible())
+    return false;
 
   SetVector<BasicBlock *> Predecessors;
-  for (auto *H : Headers) {
-    for (auto *P : predecessors(H)) {
+
+  // Redirect internal edges incident on the header.
+  BasicBlock *OldHeader = C.getHeader();
+  for (BasicBlock *P : predecessors(OldHeader)) {
+    if (C.contains(P))
       Predecessors.insert(P);
+  }
+
+  // Redirect external incoming edges. This includes the edges on the header.
+  for (BasicBlock *E : C.entries()) {
+    for (BasicBlock *P : predecessors(E)) {
+      if (!C.contains(P))
+        Predecessors.insert(P);
     }
   }
 
@@ -189,21 +188,40 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   // Redirect all the backedges through a "hub" consisting of a series
   // of guard blocks that manage the flow of control from the
   // predecessors to the headers.
-  SmallVector<BasicBlock *, 8> GuardBlocks;
+  SmallVector<BasicBlock *> GuardBlocks;
+
+  // Minor optimization: The cycle entries are discovered in an order that is
+  // the opposite of the order in which these blocks appear as branch targets.
+  // This results in a lot of condition inversions in the control flow out of
+  // the new ControlFlowHub, which can be mitigated if the orders match. So we
+  // reverse the entries when adding them to the hub.
+  SetVector<BasicBlock *> Entries;
+  Entries.insert(C.entry_rbegin(), C.entry_rend());
+
   DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
-  CreateControlFlowHub(&DTU, GuardBlocks, Predecessors, Headers, "irr");
+  CreateControlFlowHub(&DTU, GuardBlocks, Predecessors, Entries, "irr");
 #if defined(EXPENSIVE_CHECKS)
   assert(DT.verify(DominatorTree::VerificationLevel::Full));
 #else
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
 #endif
 
+  for (auto *G : GuardBlocks) {
+    LLVM_DEBUG(dbgs() << "added guard block: " << G->getName() << "\n");
+    CI.addBlockToCycle(G, &C);
+  }
+  C.setSingleEntry(GuardBlocks[0]);
+
+  if (!LI)
+    return true;
+
+  Loop *ParentLoop = LI->getLoopFor(OldHeader);
   // Create a new loop from the now-transformed cycle
-  auto NewLoop = LI.AllocateLoop();
+  auto *NewLoop = LI->AllocateLoop();
   if (ParentLoop) {
     ParentLoop->addChildLoop(NewLoop);
   } else {
-    LI.addTopLevelLoop(NewLoop);
+    LI->addTopLevelLoop(NewLoop);
   }
 
   // Add the guard blocks to the new loop. The first guard block is
@@ -213,16 +231,15 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   // are also propagated up the chain of parent loops.
   for (auto *G : GuardBlocks) {
     LLVM_DEBUG(dbgs() << "added guard block: " << G->getName() << "\n");
-    NewLoop->addBasicBlockToLoop(G, LI);
+    NewLoop->addBasicBlockToLoop(G, *LI);
   }
 
-  // Add the SCC blocks to the new loop.
-  for (auto *BB : Blocks) {
+  for (auto *BB : C.blocks()) {
     NewLoop->addBlockEntry(BB);
-    if (LI.getLoopFor(BB) == ParentLoop) {
+    if (LI->getLoopFor(BB) == ParentLoop) {
       LLVM_DEBUG(dbgs() << "moved block from parent: " << BB->getName()
                         << "\n");
-      LI.changeLoopFor(BB, NewLoop);
+      LI->changeLoopFor(BB, NewLoop);
     } else {
       LLVM_DEBUG(dbgs() << "added block from child: " << BB->getName() << "\n");
     }
@@ -230,129 +247,58 @@ static void createNaturalLoopInternal(LoopInfo &LI, DominatorTree &DT,
   LLVM_DEBUG(dbgs() << "header for new loop: "
                     << NewLoop->getHeader()->getName() << "\n");
 
-  reconnectChildLoops(LI, ParentLoop, NewLoop, Blocks, Headers);
+  reconnectChildLoops(*LI, ParentLoop, NewLoop, OldHeader);
 
   NewLoop->verifyLoop();
   if (ParentLoop) {
     ParentLoop->verifyLoop();
   }
 #if defined(EXPENSIVE_CHECKS)
-  LI.verify(DT);
+  LI->verify(DT);
 #endif // EXPENSIVE_CHECKS
-}
-
-namespace llvm {
-// Enable the graph traits required for traversing a Loop body.
-template <> struct GraphTraits<Loop> : LoopBodyTraits {};
-} // namespace llvm
 
-// Overloaded wrappers to go with the function template below.
-static BasicBlock *unwrapBlock(BasicBlock *B) { return B; }
-static BasicBlock *unwrapBlock(LoopBodyTraits::NodeRef &N) { return N.second; }
-
-static void createNaturalLoop(LoopInfo &LI, DominatorTree &DT, Function *F,
-                              SetVector<BasicBlock *> &Blocks,
-                              SetVector<BasicBlock *> &Headers) {
-  createNaturalLoopInternal(LI, DT, nullptr, Blocks, Headers);
+  return true;
 }
 
-static void createNaturalLoop(LoopInfo &LI, DominatorTree &DT, Loop &L,
-                              SetVector<BasicBlock *> &Blocks,
-                              SetVector<BasicBlock *> &Headers) {
-  createNaturalLoopInternal(LI, DT, &L, Blocks, Headers);
-}
-
-// Convert irreducible SCCs; Graph G may be a Function* or a Loop&.
-template <class Graph>
-static bool makeReducible(LoopInfo &LI, DominatorTree &DT, Graph &&G) {
-  bool Changed = false;
-  for (auto Scc = scc_begin(G); !Scc.isAtEnd(); ++Scc) {
-    if (Scc->size() < 2)
-      continue;
-    SetVector<BasicBlock *> Blocks;
-    LLVM_DEBUG(dbgs() << "Found SCC:");
-    for (auto N : *Scc) {
-      auto BB = unwrapBlock(N);
-      LLVM_DEBUG(dbgs() << " " << BB->getName());
-      Blocks.insert(BB);
-    }
-    LLVM_DEBUG(dbgs() << "\n");
-
-    // Minor optimization: The SCC blocks are usually discovered in an order
-    // that is the opposite of the order in which these blocks appear as branch
-    // targets. This results in a lot of condition inversions in the control
-    // flow out of the new ControlFlowHub, which can be mitigated if the orders
-    // match. So we discover the headers using the reverse of the block order.
-    SetVector<BasicBlock *> Headers;
-    LLVM_DEBUG(dbgs() << "Found headers:");
-    for (auto *BB : reverse(Blocks)) {
-      for (const auto P : predecessors(BB)) {
-        // Skip unreachable predecessors.
-        if (!DT.isReachableFromEntry(P))
-          continue;
-        if (!Blocks.count(P)) {
-          LLVM_DEBUG(dbgs() << " " << BB->getName());
-          Headers.insert(BB);
-          break;
-        }
-      }
-    }
-    LLVM_DEBUG(dbgs() << "\n");
-
-    if (Headers.size() == 1) {
-      assert(LI.isLoopHeader(Headers.front()));
-      LLVM_DEBUG(dbgs() << "Natural loop with a single header: skipped\n");
-      continue;
-    }
-    createNaturalLoop(LI, DT, G, Blocks, Headers);
-    Changed = true;
-  }
-  return Changed;
-}
-
-static bool FixIrreducibleImpl(Function &F, LoopInfo &LI, DominatorTree &DT) {
+static bool FixIrreducibleImpl(Function &F, CycleInfo &CI, DominatorTree &DT,
+                               LoopInfo *LI) {
   LLVM_DEBUG(dbgs() << "===== Fix irreducible control-flow in function: "
                     << F.getName() << "\n");
 
   assert(hasOnlySimpleTerminator(F) && "Unsupported block terminator.");
 
   bool Changed = false;
-  SmallVector<Loop *, 8> WorkList;
-
-  LLVM_DEBUG(dbgs() << "visiting top-level\n");
-  Changed |= makeReducible(LI, DT, &F);
-
-  // Any SCCs reduced are now already in the list of top-level loops, so simply
-  // add them all to the worklist.
-  append_range(WorkList, LI);
-
-  while (!WorkList.empty()) {
-    auto L = WorkList.pop_back_val();
-    LLVM_DEBUG(dbgs() << "visiting loop with header "
-                      << L->getHeader()->getName() << "\n");
-    Changed |= makeReducible(LI, DT, *L);
-    // Any SCCs reduced are now already in the list of child loops, so simply
-    // add them all to the worklist.
-    WorkList.append(L->begin(), L->end());
+  SmallVector<Cycle *> Worklist{CI.toplevel_cycles()};
+
+  while (!Worklist.empty()) {
+    Cycle *C = Worklist.pop_back_val();
+    Changed |= fixIrreducible(*C, CI, DT, LI);
+    append_range(Worklist, C->children());
   }
 
   return Changed;
 }
 
 bool FixIrreducible::runOnFunction(Function &F) {
-  auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
+  auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
+  LoopInfo *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;
+  auto &CI = getAnalysis<CycleInfoWrapperPass>().getResult();
   auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
-  return FixIrreducibleImpl(F, LI, DT);
+  return FixIrreducibleImpl(F, CI, DT, LI);
 }
 
 PreservedAnalyses FixIrreduciblePass::run(Function &F,
                                           FunctionAnalysisManager &AM) {
-  auto &LI = AM.getResult<LoopAnalysis>(F);
+  auto *LI = AM.getCachedResult<LoopAnalysis>(F);
+  auto &CI = AM.getResult<CycleAnalysis>(F);
   auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
-  if (!FixIrreducibleImpl(F, LI, DT))
+
+  if (!FixIrreducibleImpl(F, CI, DT, LI))
     return PreservedAnalyses::all();
+
   PreservedAnalyses PA;
   PA.preserve<LoopAnalysis>();
+  PA.preserve<CycleAnalysis>();
   PA.preserve<DominatorTreeAnalysis>();
   return PA;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index b61838c06a1f9..62b42c892a11e 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -68,8 +68,9 @@
 ; GCN-O0-NEXT:        Uniformity Analysis
 ; GCN-O0-NEXT:        Unify divergent function exit nodes
 ; GCN-O0-NEXT:        Dominator Tree Construction
-; GCN-O0-NEXT:        Natural Loop Information
+; GCN-O0-NEXT:        Cycle Info Analysis
 ; GCN-O0-NEXT:        Convert irreducible control-flow into natural loops
+; GCN-O0-NEXT:        Natural Loop Information
 ; GCN-O0-NEXT:        Fixup each natural loop to have a single exit block
 ; GCN-O0-NEXT:        Post-Dominator Tree Construction
 ; GCN-O0-NEXT:        Dominance Frontier Construction
@@ -262,8 +263,9 @@
 ; GCN-O1-NEXT:        Post-Dominator Tree Construction
 ; GCN-O1-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-NEXT:        Dominator Tree Construction
-; GCN-O1-NEXT:        Natural Loop Information
+; GCN-O1-NEXT:        Cycle Info Analysis
 ; GCN-O1-NEXT:...
[truncated]

@ssahasra
Copy link
Collaborator Author

ssahasra commented Jul 31, 2024

Note that I have not yet finished verifying all the lit tests. I might also have to add a few more tests, especially involving a mix of irreducible and reducible cycles that are siblings and/or nested inside each other in various combinations. Especially with some overlap in the entry and header nodes.

TODO:

  1. Add a test where the header of an irreducible cycle is also the header of a natural loop.
  2. Verify tests.
  3. Add tests as noted above.

@@ -189,6 +195,21 @@ template <typename ContextT> class GenericCycle {
//@{
using const_entry_iterator =
typename SmallVectorImpl<BlockT *>::const_iterator;
const_entry_iterator entry_begin() const {
return const_entry_iterator{Entries.begin()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you should be able to drop the redundant return types here and below

Suggested change
return const_entry_iterator{Entries.begin()};
return {Entries.begin()};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed in an intermediate patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// \brief Replace all entries with \p Block as single entry.
void setSingleEntry(BlockT *Block) {
Entries.clear();
Entries.push_back(Block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be nullptr? Should we add an assert to guard against this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Replaced it with a stronger test to check that Block is contained in the current cycle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@ssahasra
Copy link
Collaborator Author

ssahasra commented Aug 7, 2024

Current status: This is going to take a "bit" longer. Turns out that CreateControlFlowHub is a heavy hammer which redirects all edges from Incoming to Outgoing. But this destroys inner cycles when used to fix an irreducible cycle. For some cycle C with a subcycle S, a block B in S may have edges going to the header H of C as well as the header E of S. Now if the header E is also an entry of C, then the use of CreateControlFlowHub removes the edge from B to E, thus destroying S.

This needs a finer method that redirects only specific edges. Either that, or we let the pass destroy some cycles. But updating CycleInfo for these missing subcycles may be a fair amount of work too, so I would rather do it the right way.

ssahasra added a commit to ssahasra/llvm-project that referenced this pull request Aug 13, 2024
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.

The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.

This is needed when improving the implementation of FixIrreducible llvm#101386. Current
uses in FixIrreducible and UnifyLoopExits do not demonstrate this finer control
over the edges being rerouted.
ssahasra added a commit to ssahasra/llvm-project that referenced this pull request Aug 13, 2024
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.

The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.

This is needed when improving the implementation of FixIrreducible llvm#101386. Current
uses in FixIrreducible and UnifyLoopExits do not demonstrate this finer control
over the edges being rerouted.
ssahasra added a commit that referenced this pull request Aug 21, 2024
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.

The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.

This is needed when improving the implementation of FixIrreducible #101386.
Current use in FixIrreducible does not demonstrate this finer control over the
edges being rerouted. But in UnifyLoopExits, when only one successor of an
exiting block is an exit block, this refinement now reroutes only the relevant
control-flow through the edge; the non-exit successor is not rerouted. This
results in fewer branches and PHI nodes in the hub.
@ssahasra ssahasra changed the base branch from main to users/ssahasra/control-flow-hub August 21, 2024 09:56
@ssahasra ssahasra force-pushed the ssahasra/fix-irreducible branch from c60ea06 to eb8519e Compare August 21, 2024 09:57
@ssahasra
Copy link
Collaborator Author

This needs a finer method that redirects only specific edges. Either that, or we let the pass destroy some cycles. But updating CycleInfo for these missing subcycles may be a fair amount of work too, so I would rather do it the right way.

This now depends on the newly refactored ControlFlowHub, which correctly reroutes only the relevant edges. The effect was already caught in an existing test with nested cycles and a common header, so no new test needs to be written for this.

@ssahasra
Copy link
Collaborator Author

Note that I have not yet finished verifying all the lit tests. I might also have to add a few more tests, especially involving a mix of irreducible and reducible cycles that are siblings and/or nested inside each other in various combinations. Especially with some overlap in the entry and header nodes.

  • New tests added that involve nesting with common header or entry nodes. Existing tests also covered some relevant combinations.
  • Verified all tests.

ssahasra added a commit that referenced this pull request Aug 22, 2024
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.

The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.

This is needed when improving the implementation of FixIrreducible #101386.
Current use in FixIrreducible does not demonstrate this finer control over the
edges being rerouted. But in UnifyLoopExits, when only one successor of an
exiting block is an exit block, this refinement now reroutes only the relevant
control-flow through the edge; the non-exit successor is not rerouted. This
results in fewer branches and PHI nodes in the hub.
@ssahasra ssahasra force-pushed the users/ssahasra/control-flow-hub branch from 01beb62 to 5f6172f Compare August 22, 2024 07:07
@ssahasra ssahasra deleted the branch llvm:main August 22, 2024 07:10
@ssahasra ssahasra closed this Aug 22, 2024
@ssahasra ssahasra reopened this Aug 22, 2024
@ssahasra ssahasra changed the base branch from users/ssahasra/control-flow-hub to main August 22, 2024 07:15
cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
CreateControlFlowHub is a method that redirects control flow edges from a set of
incoming blocks to a set of outgoing blocks through a new set of "guard" blocks.
This is now refactored into a separate file with one enhancement: The input to
the method is now a set of branches rather than two sets of blocks.

The original implementation reroutes every edge from incoming blocks to outgoing
blocks. But it is possible that for some incoming block InBB, some successor S
might be in the set of outgoing blocks, but that particular edge should not be
rerouted. The new implementation makes this possible by allowing the user to
specify the targets of each branch that need to be rerouted.

This is needed when improving the implementation of FixIrreducible llvm#101386.
Current use in FixIrreducible does not demonstrate this finer control over the
edges being rerouted. But in UnifyLoopExits, when only one successor of an
exiting block is an exit block, this refinement now reroutes only the relevant
control-flow through the edge; the non-exit successor is not rerouted. This
results in fewer branches and PHI nodes in the hub.
1. CycleInfo efficiently locates all cycles in a single pass, while the SCC is
   repeated inside every natural loop.

2. CycleInfo provides a hierarchy of irreducible cycles, and the new
   implementation transforms each cycle in this hierarchy separately instead of
   reducing an entire irreducible SCC in a single step. This reduces the number
   of control-flow paths that pass through the header of each newly created
   loop. This is evidenced by the reduced number of predecessors on the "guard"
   blocks in the lit tests, and fewer operands on the corresponding PHI nodes.

3. When an entry of an irreducible cycle is the header of a child natural loop,
   the original implementation destroyed that loop. This is now preserved,
   since the incoming edges on non-header entries are not touched.

4. In the new implementation, if an irreducible cycle is a superset of a natural
   loop with the same header, then that natural loop is destroyed and replaced
   by the newly created loop.
@ssahasra ssahasra force-pushed the ssahasra/fix-irreducible branch from eb8519e to 6e5aaba Compare August 24, 2024 02:24
@ssahasra ssahasra merged commit fa4cc9d into llvm:main Aug 26, 2024
8 checks passed
@ssahasra ssahasra deleted the ssahasra/fix-irreducible branch August 26, 2024 10:21
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/4768

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
...
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/bug49779.cpp (867 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/bug50022.cpp (868 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/test_libc.cpp (869 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/wtime.c (870 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu :: offloading/bug49021.cpp (871 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu :: offloading/std_complex_arithmetic.cpp (872 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/complex_reduction.cpp (873 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/bug49021.cpp (874 of 876)
PASS: libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (875 of 876)
TIMEOUT: libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp (876 of 876)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 100 seconds

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# note: command had no output on stdout or stderr
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Slowest Tests:
--------------------------------------------------------------------------
100.05s: libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp
16.19s: libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
13.20s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_max.cpp
12.90s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_min.cpp
11.84s: libomptarget :: amdgcn-amd-amdhsa :: offloading/complex_reduction.cpp
10.28s: libomptarget :: amdgcn-amd-amdhsa :: jit/empty_kernel_lvl2.c
9.27s: libomptarget :: x86_64-pc-linux-gnu :: offloading/bug49021.cpp

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/4116

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
16.439 [1880/96/1970] Building CXX object lib/Target/CMakeFiles/LLVMTarget.dir/TargetMachineC.cpp.o
16.477 [1879/96/1971] Building CXX object lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o
16.813 [1878/96/1972] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/ScheduleDAG.cpp.o
16.856 [1877/96/1973] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineInstr.cpp.o
16.870 [1876/96/1974] Building CXX object lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64Arm64ECCallLowering.cpp.o
17.020 [1875/96/1975] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/ModuloSchedule.cpp.o
17.126 [1874/96/1976] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RegisterScavenging.cpp.o
17.142 [1873/96/1977] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/UnreachableBlockElim.cpp.o
17.369 [1872/96/1978] Building X86GenInstrInfo.inc...
17.586 [1871/96/1979] Building CXX object lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/FixIrreducible.cpp.o
FAILED: lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/FixIrreducible.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++ -DEXPENSIVE_CHECKS -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/1/llvm-clang-x86_64-expensive-checks-debian/build/lib/Transforms/Utils -I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Transforms/Utils -I/b/1/llvm-clang-x86_64-expensive-checks-debian/build/include -I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/include -U_GLIBCXX_DEBUG -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/FixIrreducible.cpp.o -MF lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/FixIrreducible.cpp.o.d -o lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/FixIrreducible.cpp.o -c /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Transforms/Utils/FixIrreducible.cpp
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Transforms/Utils/FixIrreducible.cpp:345:7: error: member reference type 'llvm::LoopInfo *' is a pointer; did you mean to use '->'?
    LI.verify(DT);
    ~~^
      ->
1 error generated.
17.630 [1871/95/1980] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/TargetSchedule.cpp.o
17.858 [1871/94/1981] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RenameIndependentSubregs.cpp.o
18.024 [1871/93/1982] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineFunction.cpp.o
18.036 [1871/92/1983] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/ScoreboardHazardRecognizer.cpp.o
18.086 [1871/91/1984] Building AMDGPUGenCallingConv.inc...
18.113 [1871/90/1985] Building AMDGPUGenMCPseudoLowering.inc...
18.410 [1871/89/1986] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/VirtRegMap.cpp.o
18.439 [1871/88/1987] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/XRayInstrumentation.cpp.o
18.843 [1871/87/1988] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineBlockPlacement.cpp.o
18.965 [1871/86/1989] Building AMDGPUGenRegBankGICombiner.inc...
19.252 [1871/85/1990] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RegAllocFast.cpp.o
19.309 [1871/84/1991] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIRPrinter.cpp.o
19.333 [1871/83/1992] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/VLIWMachineScheduler.cpp.o
19.454 [1871/82/1993] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/TailDuplicator.cpp.o
19.522 [1871/81/1994] Building AMDGPUGenPreLegalizeGICombiner.inc...
19.649 [1871/80/1995] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/TargetRegisterInfo.cpp.o
19.657 [1871/79/1996] Building AMDGPUGenPostLegalizeGICombiner.inc...
19.701 [1871/78/1997] Building CXX object lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/ScheduleDAGVLIW.cpp.o
19.705 [1871/77/1998] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RDFGraph.cpp.o
19.888 [1871/76/1999] Building AMDGPUGenMCCodeEmitter.inc...
19.891 [1871/75/2000] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o
20.001 [1871/74/2001] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/SwiftErrorValueTracking.cpp.o
20.015 [1871/73/2002] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o
20.033 [1871/72/2003] Building AMDGPUGenSubtargetInfo.inc...
20.162 [1871/71/2004] Building CXX object lib/Analysis/CMakeFiles/LLVMAnalysis.dir/CycleAnalysis.cpp.o
20.163 [1871/70/2005] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/PrologEpilogInserter.cpp.o
20.210 [1871/69/2006] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/StackSlotColoring.cpp.o
20.243 [1871/68/2007] Building AMDGPUGenDisassemblerTables.inc...
20.520 [1871/67/2008] Building RISCVGenDAGISel.inc...
20.638 [1871/66/2009] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineSink.cpp.o
21.066 [1871/65/2010] Building CXX object lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/ResourcePriorityQueue.cpp.o
21.073 [1871/64/2011] Building AMDGPUGenSearchableTables.inc...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 26, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux running on sanitizer-buildbot7 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/2945

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[167/170] Generating Msan-aarch64-with-call-Test
[168/170] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64.o
[169/170] Generating Msan-aarch64-Test
[169/170] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/interception/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/rtsan/AARCH64LinuxConfig' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 5555 tests, 48 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LeakSanitizer-AddressSanitizer-aarch64 :: TestCases/use_registers.cpp (1780 of 5555)
******************** TEST 'LeakSanitizer-AddressSanitizer-aarch64 :: TestCases/use_registers.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Test alloc: 0xdfd1f2820080 

=================================================================
==121976==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0xb7fdf35e3e8c in malloc /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
    #1 0xb7fdf3623980 in registers_thread_func /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp:16:13
    #2 0xb7fdf35e16e0 in asan_thread_start(void*) /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:239:28
    #3 0xe231f36eba48  (/lib/aarch64-linux-gnu/libc.so.6+0xeba48) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)

Objects leaked above:
0xdfd1f2820080 (1337 bytes)

SUMMARY: AddressSanitizer: 1337 byte(s) leaked in 1 allocation(s).

--
Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang  --driver-mode=g++ -O0   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   -gline-tables-only -fsanitize=address -I/home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/../ -pthread /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
+ /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang --driver-mode=g++ -O0 -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -gline-tables-only -fsanitize=address -I/home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/../ -pthread /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
RUN: at line 3: env LSAN_OPTIONS=:detect_leaks=1:"report_objects=1:use_stacks=0:use_registers=0" not  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp 2>&1 | FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp
+ env LSAN_OPTIONS=:detect_leaks=1:report_objects=1:use_stacks=0:use_registers=0 not /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
+ FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp
RUN: at line 4: env LSAN_OPTIONS=:detect_leaks=1:"report_objects=1:use_stacks=0:use_registers=1"  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp 2>&1
+ env LSAN_OPTIONS=:detect_leaks=1:report_objects=1:use_stacks=0:use_registers=1 /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

3 warning(s) in tests
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
...
[167/170] Generating Msan-aarch64-with-call-Test
[168/170] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64.o
[169/170] Generating Msan-aarch64-Test
[169/170] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/interception/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/rtsan/AARCH64LinuxConfig' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 5555 tests, 48 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LeakSanitizer-AddressSanitizer-aarch64 :: TestCases/use_registers.cpp (1780 of 5555)
******************** TEST 'LeakSanitizer-AddressSanitizer-aarch64 :: TestCases/use_registers.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Test alloc: 0xdfd1f2820080 

=================================================================
==121976==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0xb7fdf35e3e8c in malloc /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
    #1 0xb7fdf3623980 in registers_thread_func /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp:16:13
    #2 0xb7fdf35e16e0 in asan_thread_start(void*) /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:239:28
    #3 0xe231f36eba48  (/lib/aarch64-linux-gnu/libc.so.6+0xeba48) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)

Objects leaked above:
0xdfd1f2820080 (1337 bytes)

SUMMARY: AddressSanitizer: 1337 byte(s) leaked in 1 allocation(s).

--
Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang  --driver-mode=g++ -O0   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   -gline-tables-only -fsanitize=address -I/home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/../ -pthread /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
+ /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang --driver-mode=g++ -O0 -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -gline-tables-only -fsanitize=address -I/home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/../ -pthread /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
RUN: at line 3: env LSAN_OPTIONS=:detect_leaks=1:"report_objects=1:use_stacks=0:use_registers=0" not  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp 2>&1 | FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp
+ env LSAN_OPTIONS=:detect_leaks=1:report_objects=1:use_stacks=0:use_registers=0 not /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp
+ FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/lsan/TestCases/use_registers.cpp
RUN: at line 4: env LSAN_OPTIONS=:detect_leaks=1:"report_objects=1:use_stacks=0:use_registers=1"  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp 2>&1
+ env LSAN_OPTIONS=:detect_leaks=1:report_objects=1:use_stacks=0:use_registers=1 /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/lsan/AARCH64AsanConfig/TestCases/Output/use_registers.cpp.tmp

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

3 warning(s) in tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants