Skip to content

DynamicLoaderDarwin load images in parallel with preload #110646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DmT021
Copy link
Contributor

@DmT021 DmT021 commented Oct 1, 2024

This change enables DynamicLoaderDarwin to load modules in parallel using the thread pool. This new behavior is controlled by a new setting plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load, which is enabled by default. When disabled, DynamicLoaderDarwin will load modules sequentially as before.

@DmT021 DmT021 changed the title Wp/dynamic loader darwin parallel image load with preload main DynamicLoaderDarwin load images in parallel with preload Oct 1, 2024
Copy link

github-actions bot commented Oct 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jasonmolenda jasonmolenda self-requested a review October 6, 2024 03:12
@DmT021 DmT021 force-pushed the wp/DynamicLoaderDarwin_parallel_image_load_with_preload_main branch 2 times, most recently from 0c4780e to f6cffb4 Compare October 11, 2024 01:33
Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the review and discussion happened in the "non-preload" version of the PR ( #110439 ), but I agree with Dmitrii that this is the best approach to merge, this is approved. Very nice job spotting a prime opportunity for multithreading, thanks for doing the work to implement it. I think for native macOS debugging (with all binaries in the same shared cache as lldb's) we're seeing some less than optimal results so we've got a bottleneck that I'll try to look into, I suspect maybe related to how we find binaries in our shared cache. This is still a speedup or not-slower result in every use case I tried.

@DmT021 DmT021 marked this pull request as ready for review October 14, 2024 16:35
@DmT021 DmT021 requested a review from JDevlieghere as a code owner October 14, 2024 16:35
@llvmbot llvmbot added the lldb label Oct 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-lldb

Author: Dmitrii Galimzianov (DmT021)

Changes

This is another approach to the goal described in this PR. This PR is provided just for convenience.


Patch is 23.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110646.diff

9 Files Affected:

  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt (+13)
  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp (+100-30)
  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h (+17-6)
  • (added) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp (+53)
  • (added) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h (+34)
  • (added) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td (+8)
  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderMacOS.cpp (+6-4)
  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderMacOSXDYLD.cpp (+10-3)
  • (modified) lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderMacOSXDYLD.h (+2)
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
index 7308374c8bfba6..77a560541fcb1f 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
@@ -1,7 +1,16 @@
+lldb_tablegen(DynamicLoaderDarwinProperties.inc -gen-lldb-property-defs
+  SOURCE DynamicLoaderDarwinProperties.td
+  TARGET LLDBPluginDynamicLoaderDarwinPropertiesGen)
+
+lldb_tablegen(DynamicLoaderDarwinPropertiesEnum.inc -gen-lldb-property-enum-defs
+  SOURCE DynamicLoaderDarwinProperties.td
+  TARGET LLDBPluginDynamicLoaderDarwinPropertiesEnumGen)
+
 add_lldb_library(lldbPluginDynamicLoaderMacOSXDYLD PLUGIN
   DynamicLoaderMacOSXDYLD.cpp
   DynamicLoaderMacOS.cpp
   DynamicLoaderDarwin.cpp
+  DynamicLoaderDarwinProperties.cpp
 
   LINK_LIBS
     lldbBreakpoint
@@ -16,3 +25,7 @@ add_lldb_library(lldbPluginDynamicLoaderMacOSXDYLD PLUGIN
     Support
     TargetParser
   )
+
+add_dependencies(lldbPluginDynamicLoaderMacOSXDYLD
+  LLDBPluginDynamicLoaderDarwinPropertiesGen
+  LLDBPluginDynamicLoaderDarwinPropertiesEnumGen)
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
index 30242038a5f66f..c6a59d9e87a46b 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
@@ -8,6 +8,7 @@
 
 #include "DynamicLoaderDarwin.h"
 
+#include "DynamicLoaderDarwinProperties.h"
 #include "lldb/Breakpoint/StoppointCallbackContext.h"
 #include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
@@ -31,6 +32,7 @@
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/State.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include "Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.h"
 #include "Plugins/TypeSystem/Clang/TypeSystemClang.h"
@@ -77,6 +79,17 @@ void DynamicLoaderDarwin::DidLaunch() {
   SetNotificationBreakpoint();
 }
 
+void DynamicLoaderDarwin::CreateSettings(lldb_private::Debugger &debugger) {
+  if (!PluginManager::GetSettingForDynamicLoaderPlugin(
+          debugger, DynamicLoaderDarwinProperties::GetSettingName())) {
+    const bool is_global_setting = true;
+    PluginManager::CreateSettingForDynamicLoaderPlugin(
+        debugger,
+        DynamicLoaderDarwinProperties::GetGlobal().GetValueProperties(),
+        "Properties for the DynamicLoaderDarwin plug-in.", is_global_setting);
+  }
+}
+
 // Clear out the state of this class.
 void DynamicLoaderDarwin::Clear(bool clear_process) {
   std::lock_guard<std::recursive_mutex> guard(m_mutex);
@@ -88,7 +101,7 @@ void DynamicLoaderDarwin::Clear(bool clear_process) {
 }
 
 ModuleSP DynamicLoaderDarwin::FindTargetModuleForImageInfo(
-    ImageInfo &image_info, bool can_create, bool *did_create_ptr) {
+    const ImageInfo &image_info, bool can_create, bool *did_create_ptr) {
   if (did_create_ptr)
     *did_create_ptr = false;
 
@@ -517,8 +530,8 @@ bool DynamicLoaderDarwin::JSONImageInformationIntoImageInfo(
   return true;
 }
 
-void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
-    ImageInfo::collection &image_infos) {
+void DynamicLoaderDarwin::UpdateSpecialBinariesFromPreloadedModules(
+    std::vector<std::pair<ImageInfo, ModuleSP>> &images) {
   uint32_t exe_idx = UINT32_MAX;
   uint32_t dyld_idx = UINT32_MAX;
   Target &target = m_process->GetTarget();
@@ -526,35 +539,34 @@ void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
   ConstString g_dyld_sim_filename("dyld_sim");
 
   ArchSpec target_arch = target.GetArchitecture();
-  const size_t image_infos_size = image_infos.size();
-  for (size_t i = 0; i < image_infos_size; i++) {
-    if (image_infos[i].header.filetype == llvm::MachO::MH_DYLINKER) {
+  const size_t images_size = images.size();
+  for (size_t i = 0; i < images_size; i++) {
+    const auto &image_info = images[i].first;
+    if (image_info.header.filetype == llvm::MachO::MH_DYLINKER) {
       // In a "simulator" process we will have two dyld modules --
       // a "dyld" that we want to keep track of, and a "dyld_sim" which
       // we don't need to keep track of here.  dyld_sim will have a non-macosx
       // OS.
       if (target_arch.GetTriple().getEnvironment() == llvm::Triple::Simulator &&
-          image_infos[i].os_type != llvm::Triple::OSType::MacOSX) {
+          image_info.os_type != llvm::Triple::OSType::MacOSX) {
         continue;
       }
 
       dyld_idx = i;
     }
-    if (image_infos[i].header.filetype == llvm::MachO::MH_EXECUTE) {
+    if (image_info.header.filetype == llvm::MachO::MH_EXECUTE) {
       exe_idx = i;
     }
   }
 
   // Set the target executable if we haven't found one so far.
   if (exe_idx != UINT32_MAX && !target.GetExecutableModule()) {
-    const bool can_create = true;
-    ModuleSP exe_module_sp(FindTargetModuleForImageInfo(image_infos[exe_idx],
-                                                        can_create, nullptr));
+    ModuleSP exe_module_sp = images[exe_idx].second;
     if (exe_module_sp) {
       LLDB_LOGF(log, "Found executable module: %s",
                 exe_module_sp->GetFileSpec().GetPath().c_str());
       target.GetImages().AppendIfNeeded(exe_module_sp);
-      UpdateImageLoadAddress(exe_module_sp.get(), image_infos[exe_idx]);
+      UpdateImageLoadAddress(exe_module_sp.get(), images[exe_idx].first);
       if (exe_module_sp.get() != target.GetExecutableModulePointer())
         target.SetExecutableModule(exe_module_sp, eLoadDependentsNo);
 
@@ -581,14 +593,12 @@ void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
   }
 
   if (dyld_idx != UINT32_MAX) {
-    const bool can_create = true;
-    ModuleSP dyld_sp = FindTargetModuleForImageInfo(image_infos[dyld_idx],
-                                                    can_create, nullptr);
+    ModuleSP dyld_sp = images[dyld_idx].second;
     if (dyld_sp.get()) {
       LLDB_LOGF(log, "Found dyld module: %s",
                 dyld_sp->GetFileSpec().GetPath().c_str());
       target.GetImages().AppendIfNeeded(dyld_sp);
-      UpdateImageLoadAddress(dyld_sp.get(), image_infos[dyld_idx]);
+      UpdateImageLoadAddress(dyld_sp.get(), images[dyld_idx].first);
       SetDYLDModule(dyld_sp);
     }
   }
@@ -642,26 +652,86 @@ ModuleSP DynamicLoaderDarwin::GetDYLDModule() {
 
 void DynamicLoaderDarwin::ClearDYLDModule() { m_dyld_module_wp.reset(); }
 
+template <typename InputIterator, typename ResultType>
+static std::vector<ResultType> parallel_map(
+    llvm::ThreadPoolInterface &threadPool, InputIterator first,
+    InputIterator last,
+    llvm::function_ref<ResultType(
+        const typename std::iterator_traits<InputIterator>::value_type &)>
+        transform) {
+  const auto size = std::distance(first, last);
+  std::vector<ResultType> results(size);
+  if (size > 0) {
+    llvm::ThreadPoolTaskGroup taskGroup(threadPool);
+    auto it = first;
+    for (ssize_t i = 0; i < size; ++i, ++it) {
+      taskGroup.async([&, i, it]() { results[i] = transform(*it); });
+    }
+    taskGroup.wait();
+  }
+  return results;
+}
+
+template <typename InputIterator, typename ResultType>
+static std::vector<ResultType>
+map(InputIterator first, InputIterator last,
+    llvm::function_ref<ResultType(
+        const typename std::iterator_traits<InputIterator>::value_type &)>
+        transform) {
+  const auto size = std::distance(first, last);
+  std::vector<ResultType> results(size);
+  auto it = first;
+  for (ssize_t i = 0; i < size; ++i, ++it) {
+    results[i] = transform(*it);
+  }
+  return results;
+}
+
+std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>>
+DynamicLoaderDarwin::PreloadModulesFromImageInfos(
+    const ImageInfo::collection &image_infos) {
+  auto ImageLoad = [this](const ImageInfo &image_info) {
+    return std::make_pair(
+        image_info, FindTargetModuleForImageInfo(image_info, true, nullptr));
+  };
+  bool is_parallel_load =
+      DynamicLoaderDarwinProperties::GetGlobal().GetEnableParallelImageLoad();
+  auto images = is_parallel_load
+                    ? parallel_map<ImageInfo::collection::const_iterator,
+                                   std::pair<ImageInfo, ModuleSP>>(
+                          Debugger::GetThreadPool(), image_infos.begin(),
+                          image_infos.end(), ImageLoad)
+                    : map<ImageInfo::collection::const_iterator,
+                          std::pair<ImageInfo, ModuleSP>>(
+                          image_infos.begin(), image_infos.end(), ImageLoad);
+  return images;
+}
+
 bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
     ImageInfo::collection &image_infos) {
   std::lock_guard<std::recursive_mutex> guard(m_mutex);
+  auto images = PreloadModulesFromImageInfos(image_infos);
+  return AddModulesUsingPreloadedModules(images);
+}
+
+bool DynamicLoaderDarwin::AddModulesUsingPreloadedModules(
+    std::vector<std::pair<ImageInfo, ModuleSP>> &images) {
+  std::lock_guard<std::recursive_mutex> guard(m_mutex);
   // Now add these images to the main list.
   ModuleList loaded_module_list;
   Log *log = GetLog(LLDBLog::DynamicLoader);
   Target &target = m_process->GetTarget();
   ModuleList &target_images = target.GetImages();
 
-  for (uint32_t idx = 0; idx < image_infos.size(); ++idx) {
+  for (uint32_t idx = 0; idx < images.size(); ++idx) {
+    auto &image_info = images[idx].first;
+    const auto &image_module_sp = images[idx].second;
     if (log) {
       LLDB_LOGF(log, "Adding new image at address=0x%16.16" PRIx64 ".",
-                image_infos[idx].address);
-      image_infos[idx].PutToLog(log);
+                image_info.address);
+      image_info.PutToLog(log);
     }
-
-    m_dyld_image_infos.push_back(image_infos[idx]);
-
-    ModuleSP image_module_sp(
-        FindTargetModuleForImageInfo(image_infos[idx], true, nullptr));
+    m_dyld_image_infos.push_back(image_info);
 
     if (image_module_sp) {
       ObjectFile *objfile = image_module_sp->GetObjectFile();
@@ -673,7 +743,7 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
               sections->FindSectionByName(commpage_dbstr).get();
           if (commpage_section) {
             ModuleSpec module_spec(objfile->GetFileSpec(),
-                                   image_infos[idx].GetArchitecture());
+                                   image_info.GetArchitecture());
             module_spec.GetObjectName() = commpage_dbstr;
             ModuleSP commpage_image_module_sp(
                 target_images.FindFirstModule(module_spec));
@@ -686,17 +756,17 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
               if (!commpage_image_module_sp ||
                   commpage_image_module_sp->GetObjectFile() == nullptr) {
                 commpage_image_module_sp = m_process->ReadModuleFromMemory(
-                    image_infos[idx].file_spec, image_infos[idx].address);
+                    image_info.file_spec, image_info.address);
                 // Always load a memory image right away in the target in case
                 // we end up trying to read the symbol table from memory... The
                 // __LINKEDIT will need to be mapped so we can figure out where
                 // the symbol table bits are...
                 bool changed = false;
                 UpdateImageLoadAddress(commpage_image_module_sp.get(),
-                                       image_infos[idx]);
+                                       image_info);
                 target.GetImages().Append(commpage_image_module_sp);
                 if (changed) {
-                  image_infos[idx].load_stop_id = m_process->GetStopID();
+                  image_info.load_stop_id = m_process->GetStopID();
                   loaded_module_list.AppendIfNeeded(commpage_image_module_sp);
                 }
               }
@@ -709,14 +779,14 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
       // address. We need to check this so we don't mention that all loaded
       // shared libraries are newly loaded each time we hit out dyld breakpoint
       // since dyld will list all shared libraries each time.
-      if (UpdateImageLoadAddress(image_module_sp.get(), image_infos[idx])) {
+      if (UpdateImageLoadAddress(image_module_sp.get(), image_info)) {
         target_images.AppendIfNeeded(image_module_sp);
         loaded_module_list.AppendIfNeeded(image_module_sp);
       }
 
       // To support macCatalyst and legacy iOS simulator,
       // update the module's platform with the DYLD info.
-      ArchSpec dyld_spec = image_infos[idx].GetArchitecture();
+      ArchSpec dyld_spec = image_info.GetArchitecture();
       auto &dyld_triple = dyld_spec.GetTriple();
       if ((dyld_triple.getEnvironment() == llvm::Triple::MacABI &&
            dyld_triple.getOS() == llvm::Triple::IOS) ||
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
index 45c693163f8105..bc5464d76b9503 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
@@ -58,6 +58,8 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
 
   std::optional<lldb_private::Address> GetStartAddress() override;
 
+  static void CreateSettings(lldb_private::Debugger &debugger);
+
 protected:
   void PrivateInitialize(lldb_private::Process *process);
 
@@ -174,7 +176,7 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
 
   bool UnloadModuleSections(lldb_private::Module *module, ImageInfo &info);
 
-  lldb::ModuleSP FindTargetModuleForImageInfo(ImageInfo &image_info,
+  lldb::ModuleSP FindTargetModuleForImageInfo(const ImageInfo &image_info,
                                               bool can_create,
                                               bool *did_create_ptr);
 
@@ -201,11 +203,18 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
       lldb_private::StructuredData::ObjectSP image_details,
       ImageInfo::collection &image_infos);
 
-  // If image_infos contains / may contain dyld or executable image, call this
-  // method
-  // to keep our internal record keeping of the special binaries up-to-date.
-  void
-  UpdateSpecialBinariesFromNewImageInfos(ImageInfo::collection &image_infos);
+  // Finds/loads modules for a given `image_infos` and returns pairs
+  // (ImageInfo, ModuleSP).
+  // Prefer using this method rather than calling `FindTargetModuleForImageInfo`
+  // directly as this method may load the modules in parallel.
+  std::vector<std::pair<ImageInfo, lldb::ModuleSP>>
+  PreloadModulesFromImageInfos(const ImageInfo::collection &image_infos);
+
+  // If `images` contains / may contain dyld or executable image, call this
+  // method to keep our internal record keeping of the special binaries
+  // up-to-date.
+  void UpdateSpecialBinariesFromPreloadedModules(
+      std::vector<std::pair<ImageInfo, lldb::ModuleSP>> &images);
 
   // if image_info is a dyld binary, call this method
   bool UpdateDYLDImageInfoFromNewImageInfo(ImageInfo &image_info);
@@ -215,6 +224,8 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
   void AddExecutableModuleIfInImageInfos(ImageInfo::collection &image_infos);
 
   bool AddModulesUsingImageInfos(ImageInfo::collection &image_infos);
+  bool AddModulesUsingPreloadedModules(
+      std::vector<std::pair<ImageInfo, lldb::ModuleSP>> &images);
 
   // Whether we should use the new dyld SPI to get shared library information,
   // or read
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp
new file mode 100644
index 00000000000000..f4d8a071e6d5de
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp
@@ -0,0 +1,53 @@
+//===-- DynamicLoaderDarwinProperties.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "DynamicLoaderDarwinProperties.h"
+
+using namespace lldb_private;
+
+#define LLDB_PROPERTIES_dynamicloaderdarwin_experimental
+#include "DynamicLoaderDarwinProperties.inc"
+
+enum {
+#define LLDB_PROPERTIES_dynamicloaderdarwin_experimental
+#include "DynamicLoaderDarwinPropertiesEnum.inc"
+};
+
+llvm::StringRef DynamicLoaderDarwinProperties::GetSettingName() {
+  static constexpr llvm::StringLiteral g_setting_name("darwin");
+  return g_setting_name;
+}
+
+DynamicLoaderDarwinProperties::ExperimentalProperties::ExperimentalProperties()
+    : Properties(std::make_shared<OptionValueProperties>(
+          GetExperimentalSettingsName())) {
+  m_collection_sp->Initialize(g_dynamicloaderdarwin_experimental_properties);
+}
+
+DynamicLoaderDarwinProperties::DynamicLoaderDarwinProperties()
+    : Properties(std::make_shared<OptionValueProperties>(GetSettingName())),
+      m_experimental_properties(std::make_unique<ExperimentalProperties>()) {
+  m_collection_sp->AppendProperty(
+      Properties::GetExperimentalSettingsName(),
+      "Experimental settings - setting these won't produce errors if the "
+      "setting is not present.",
+      true, m_experimental_properties->GetValueProperties());
+}
+
+bool DynamicLoaderDarwinProperties::GetEnableParallelImageLoad() const {
+  return m_experimental_properties->GetPropertyAtIndexAs<bool>(
+      ePropertyEnableParallelImageLoad,
+      g_dynamicloaderdarwin_experimental_properties
+              [ePropertyEnableParallelImageLoad]
+                  .default_uint_value != 0);
+}
+
+DynamicLoaderDarwinProperties &DynamicLoaderDarwinProperties::GetGlobal() {
+  static DynamicLoaderDarwinProperties g_settings;
+  return g_settings;
+}
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h
new file mode 100644
index 00000000000000..4c5e800c4f3e4c
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h
@@ -0,0 +1,34 @@
+//===-- DynamicLoaderDarwinProperties.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
+#define LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
+
+#include "lldb/Core/UserSettingsController.h"
+
+namespace lldb_private {
+
+class DynamicLoaderDarwinProperties : public Properties {
+public:
+  class ExperimentalProperties : public Properties {
+  public:
+    ExperimentalProperties();
+  };
+  static llvm::StringRef GetSettingName();
+  static DynamicLoaderDarwinProperties &GetGlobal();
+  DynamicLoaderDarwinProperties();
+  ~DynamicLoaderDarwinProperties() override = default;
+  bool GetEnableParallelImageLoad() const;
+
+private:
+  std::unique_ptr<ExperimentalProperties> m_experimental_properties;
+};
+
+} // namespace lldb_private
+
+#endif // LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td
new file mode 100644
index 00000000000000..c54580ce347296
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td
@@ -0,0 +1,8 @@
+include "../../../../include/lldb/Core/PropertiesBase.td"
+
+let Definition = "dynamicloaderdarwin_experimental" in {
+  def EnableParallelImageLoad: Property<"enable-parallel-image-load", "Boolean">,
+    Global,
+    DefaultTrue,
+    Desc<"Load images in parallel.">;
+}
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderMacOS.cpp b/lldb/source/Plugins/Dy...
[truncated]

@DmT021 DmT021 force-pushed the wp/DynamicLoaderDarwin_parallel_image_load_with_preload_main branch from f6cffb4 to c8f133b Compare October 14, 2024 16:35
@DmT021
Copy link
Contributor Author

DmT021 commented Oct 14, 2024

@jasonmolenda All the tests are green but I don't have the merge button. Can you merge it, please?

@jasonmolenda
Copy link
Collaborator

@DmT021 I will, can you edit the Description in this PR for the commit message that will be recorded?

Comment on lines 655 to 673
template <typename InputIterator, typename ResultType>
static std::vector<ResultType> parallel_map(
llvm::ThreadPoolInterface &threadPool, InputIterator first,
InputIterator last,
llvm::function_ref<ResultType(
const typename std::iterator_traits<InputIterator>::value_type &)>
transform) {
const auto size = std::distance(first, last);
std::vector<ResultType> results(size);
if (size > 0) {
llvm::ThreadPoolTaskGroup taskGroup(threadPool);
auto it = first;
for (ssize_t i = 0; i < size; ++i, ++it) {
taskGroup.async([&, i, it]() { results[i] = transform(*it); });
}
taskGroup.wait();
}
return results;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these map helpers? I would imagine you could achieve the same thing with something like this:

TaskGroup task_group(...);
for (...)
  if (is_parallel)
    task_group.async(lambda, args);
  else 
    lambda(args);
task_group.wait();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can but it may not be better. The implementations of map and parallel_map aren't important from the business logic perspective. The only important thing about PreloadModulesFromImageInfos is checking the setting.
Also, the code will be a bit more complicated than shown in the snippet you provided:

  • we are creating a task group and waiting for it even when we are going to load the modules sequentially.
  • we are checking the setting on each iteration of the loop.

This is probably almost free (I'm not sure if that's the case for task_group.wait() though) but it's still a bit of an eyesore.

But a version of this without these two disadvantages will have a repeating for loop. Something like

std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>>
DynamicLoaderDarwin::PreloadModulesFromImageInfos(
    const ImageInfo::collection &image_infos) {
  const auto size = image_infos.size();
  std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>> images(
      size);
  auto lambda = [&](size_t i, ImageInfo::collection::const_iterator it) {
    const auto &image_info = *it;
    images[i] = std::make_pair(
        image_info, FindTargetModuleForImageInfo(image_info, true, nullptr));
  };
  auto it = image_infos.begin();
  bool is_parallel_load =
      DynamicLoaderDarwinProperties::GetGlobal().GetEnableParallelImageLoad();
  if (is_parallel_load) {
    if (size > 0) {
      llvm::ThreadPoolTaskGroup taskGroup(Debugger::GetThreadPool());
      for (size_t i = 0; i < size; ++i, ++it) {
        taskGroup.async(lambda, i, it);
      }
      taskGroup.wait();
    }
  } else {
    for (size_t i = 0; i < size; ++i, ++it) {
      lambda(i, it);
    }
  }
  return images;
}

So now we have a bigger room for error in future refactoring here and the important part (checking the setting) is less visible in the code as well.

Also, consider how would this code look like if had std::transform with ExecutionPolicy available. I think it'd look about the same as the current implementation with map and parallel_map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think the new code, even with the little bit of code duplication, reads a lot easier than the templated signature of the (parallel_)map functions. I personally wouldn't bother with the size > 0 check (I expect that creating an empty task group and waiting on an empty task group is essentially free). Anyway, I don't feel super strongly about it, so go with whatever you think is best.

FWIW I don't expect us to ever migrate this code to std::transform because we definitely want to limit the parallelism by using the debugger's thread pool.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmT021 I didn't want to merge the PR if you were still considering Jonas' opinion on this approach. Did you want to change the PR or are you fine with it as-is. Jonas says the choice is yours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonmolenda @JDevlieghere I decided to inline the helpers. I still think we are mixing important and utilitary code here and it's less than ideal. But at the same time, I fully understand the readability issue. What's also important here the signatures of the map functions aren't ideal to be considered "general purpose helpers". Namely, they expect std::distance to be O(1) which isn't always the case with the current template constraints. So I think if we are going to use the same technique in other places (e.g. Target::SetExecutableModule and other dynamic loader plugins) - we may want to write better (more general) implementations for these helpers in some common place.

PS Sorry for the long answer, it took me a while to come up with the decision here.

@jasonmolenda jasonmolenda merged commit 5f2cf99 into llvm:main Oct 15, 2024
7 checks passed
@jasonmolenda
Copy link
Collaborator

Very nice! And now we'll see what all the CI bots think of this change. :)

DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
This change enables `DynamicLoaderDarwin` to load modules in parallel
using the thread pool. This new behavior is controlled by a new setting
`plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`,
which is enabled by default. When disabled, DynamicLoaderDarwin will
load modules sequentially as before.
JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Nov 1, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

  Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
    Range (min … max):    1.220 s …  1.229 s    10 runs

  Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
    Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit that referenced this pull request Nov 2, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by #110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by #110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.

(cherry picked from commit a57296a)
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Nov 19, 2024
JDevlieghere added a commit that referenced this pull request Nov 20, 2024
royitaqi pushed a commit that referenced this pull request Mar 31, 2025
This patch improves LLDB launch time on Linux machines for **preload
scenarios**, particularly for executables with a lot of shared library
dependencies (or modules). Specifically:
* Launching a binary with `target.preload-symbols = true` 
* Attaching to a process with `target.preload-symbols = true`.
It's completely controlled by a new flag added in the first commit
`plugin.dynamic-loader.posix-dyld.parallel-module-load`, which *defaults
to false*. This was inspired by similar work on Darwin #110646.

Some rough numbers to showcase perf improvement, run on a very beefy
machine:
* Executable with ~5600 modules: baseline 45s, improvement 15s
* Executable with ~3800 modules: baseline 25s,  improvement 10s
* Executable with ~6650 modules: baseline 67s, improvement 20s
* Executable with ~12500 modules: baseline 185s, improvement 85s
* Executable with ~14700 modules: baseline 235s, improvement 120s
A lot of targets we deal with have a *ton* of modules, and unfortunately
we're unable to convince other folks to reduce the number of modules, so
performance improvements like this can be very impactful for user
experience.

This patch achieves the performance improvement by parallelizing
`DynamicLoaderPOSIXDYLD::RefreshModules` for the launch scenario, and
`DynamicLoaderPOSIXDYLD::LoadAllCurrentModules` for the attach scenario.
The commits have some context on their specific changes as well --
hopefully this helps the review.

# More context on implementation

We discovered the bottlenecks by via `perf record -g -p <lldb's pid>` on
a Linux machine. With an executable known to have 1000s of shared
library dependencies, I ran
```
(lldb) b main
(lldb) r
# taking a while
```
and showed the resulting perf trace (snippet shown)
```
Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812
  Children      Self  Command          Shared Object              Symbol
-   93.54%     0.00%  intern-state     libc.so.6                  [.] clone3
     clone3
     start_thread
     lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*)                                                                           r
     std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&)
     lldb_private::Process::RunPrivateStateThread(bool)                                                                                          n
   - lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&)
      - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*)
         - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
            - lldb_private::Thread::ShouldStop(lldb_private::Event*)                                                                             *
               - 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*)                                            t
                  - 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*)                                     i
                       lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*)                           k
                       lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*)                                     b
                       lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long)    i
                       DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo
                     - DynamicLoaderPOSIXDYLD::RefreshModules()                                                                                  O
                        - 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&) const                 u
                           - 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, bools
                              - lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, boos
                                 - 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&)                        o
                                    - 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, lldb_private::Status*
                                       - 77.89% lldb_private::Module::PreloadSymbols()
                                          - 44.06% lldb_private::Symtab::PreloadSymbols()
                                             - 43.66% lldb_private::Symtab::InitNameIndexes()
...
```
We saw that majority of time was spent in `RefreshModules`, with the
main culprit within it `LoadModuleAtAddress` which eventually calls
`PreloadSymbols`.

At first, `DynamicLoaderPOSIXDYLD::LoadModuleAtAddress` appears fairly
independent -- most of it deals with different files and then getting or
creating Modules from these files. The portions that aren't independent
seem to deal with ModuleLists, which appear concurrency safe. There were
members of `DynamicLoaderPOSIXDYLD` I had to synchronize though: namely
`m_loaded_modules` which `DynamicLoaderPOSIXDYLD` maintains to map its
loaded modules to their link addresses. Without synchronizing this, I
ran into SEGFAULTS and other issues when running `check-lldb`. I also
locked the assignment and comparison of `m_interpreter_module`, which
may be unnecessary.

# Alternate implementations

When creating this patch, another implementation I considered was
directly background-ing the call to `Module::PreloadSymbol` in
`Target::GetOrCreateModule`. It would have the added benefit of working
across platforms generically, and appeared to be concurrency safe. It
was done via `Debugger::GetThreadPool().async` directly. However, there
were a ton of concurrency issues, so I abandoned that approach for now.

# Testing

With the feature active, I tested via `ninja check-lldb` on both Debug
and Release builds several times (~5 or 6 altogether?), and didn't spot
additional failing or flaky tests.

I also tested manually on several different binaries, some with around
14000 modules, but just basic operations: launching, reaching main,
setting breakpoint, stepping, showing some backtraces.

I've also tested with the flag off just to make sure things behave
properly synchronously.
royitaqi pushed a commit that referenced this pull request Apr 7, 2025
#134437)

A requested follow-up from
#130912 by @JDevlieghere to
control Darwin parallel image loading with the same
`target.parallel-module-load` that controls the POSIX dyld parallel
image loading. Darwin parallel image loading was introduced by
#110646.

This small change:
* removes
`plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`
and associated code.
* changes setting call site in
`DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new
setting.

Tested by running `ninja check-lldb` and loading some targets.

Co-authored-by: Tom Yang <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 7, 2025
…-module-load (#134437)

A requested follow-up from
llvm/llvm-project#130912 by @JDevlieghere to
control Darwin parallel image loading with the same
`target.parallel-module-load` that controls the POSIX dyld parallel
image loading. Darwin parallel image loading was introduced by
llvm/llvm-project#110646.

This small change:
* removes
`plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`
and associated code.
* changes setting call site in
`DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new
setting.

Tested by running `ninja check-lldb` and loading some targets.

Co-authored-by: Tom Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants