-
Notifications
You must be signed in to change notification settings - Fork 13.3k
DynamicLoaderDarwin load images in parallel with preload #110646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamicLoaderDarwin load images in parallel with preload #110646
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
0c4780e
to
f6cffb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the review and discussion happened in the "non-preload" version of the PR ( #110439 ), but I agree with Dmitrii that this is the best approach to merge, this is approved. Very nice job spotting a prime opportunity for multithreading, thanks for doing the work to implement it. I think for native macOS debugging (with all binaries in the same shared cache as lldb's) we're seeing some less than optimal results so we've got a bottleneck that I'll try to look into, I suspect maybe related to how we find binaries in our shared cache. This is still a speedup or not-slower result in every use case I tried.
@llvm/pr-subscribers-lldb Author: Dmitrii Galimzianov (DmT021) ChangesThis is another approach to the goal described in this PR. This PR is provided just for convenience. Patch is 23.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110646.diff 9 Files Affected:
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
index 7308374c8bfba6..77a560541fcb1f 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/CMakeLists.txt
@@ -1,7 +1,16 @@
+lldb_tablegen(DynamicLoaderDarwinProperties.inc -gen-lldb-property-defs
+ SOURCE DynamicLoaderDarwinProperties.td
+ TARGET LLDBPluginDynamicLoaderDarwinPropertiesGen)
+
+lldb_tablegen(DynamicLoaderDarwinPropertiesEnum.inc -gen-lldb-property-enum-defs
+ SOURCE DynamicLoaderDarwinProperties.td
+ TARGET LLDBPluginDynamicLoaderDarwinPropertiesEnumGen)
+
add_lldb_library(lldbPluginDynamicLoaderMacOSXDYLD PLUGIN
DynamicLoaderMacOSXDYLD.cpp
DynamicLoaderMacOS.cpp
DynamicLoaderDarwin.cpp
+ DynamicLoaderDarwinProperties.cpp
LINK_LIBS
lldbBreakpoint
@@ -16,3 +25,7 @@ add_lldb_library(lldbPluginDynamicLoaderMacOSXDYLD PLUGIN
Support
TargetParser
)
+
+add_dependencies(lldbPluginDynamicLoaderMacOSXDYLD
+ LLDBPluginDynamicLoaderDarwinPropertiesGen
+ LLDBPluginDynamicLoaderDarwinPropertiesEnumGen)
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
index 30242038a5f66f..c6a59d9e87a46b 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
@@ -8,6 +8,7 @@
#include "DynamicLoaderDarwin.h"
+#include "DynamicLoaderDarwinProperties.h"
#include "lldb/Breakpoint/StoppointCallbackContext.h"
#include "lldb/Core/Debugger.h"
#include "lldb/Core/Module.h"
@@ -31,6 +32,7 @@
#include "lldb/Utility/LLDBLog.h"
#include "lldb/Utility/Log.h"
#include "lldb/Utility/State.h"
+#include "llvm/Support/ThreadPool.h"
#include "Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.h"
#include "Plugins/TypeSystem/Clang/TypeSystemClang.h"
@@ -77,6 +79,17 @@ void DynamicLoaderDarwin::DidLaunch() {
SetNotificationBreakpoint();
}
+void DynamicLoaderDarwin::CreateSettings(lldb_private::Debugger &debugger) {
+ if (!PluginManager::GetSettingForDynamicLoaderPlugin(
+ debugger, DynamicLoaderDarwinProperties::GetSettingName())) {
+ const bool is_global_setting = true;
+ PluginManager::CreateSettingForDynamicLoaderPlugin(
+ debugger,
+ DynamicLoaderDarwinProperties::GetGlobal().GetValueProperties(),
+ "Properties for the DynamicLoaderDarwin plug-in.", is_global_setting);
+ }
+}
+
// Clear out the state of this class.
void DynamicLoaderDarwin::Clear(bool clear_process) {
std::lock_guard<std::recursive_mutex> guard(m_mutex);
@@ -88,7 +101,7 @@ void DynamicLoaderDarwin::Clear(bool clear_process) {
}
ModuleSP DynamicLoaderDarwin::FindTargetModuleForImageInfo(
- ImageInfo &image_info, bool can_create, bool *did_create_ptr) {
+ const ImageInfo &image_info, bool can_create, bool *did_create_ptr) {
if (did_create_ptr)
*did_create_ptr = false;
@@ -517,8 +530,8 @@ bool DynamicLoaderDarwin::JSONImageInformationIntoImageInfo(
return true;
}
-void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
- ImageInfo::collection &image_infos) {
+void DynamicLoaderDarwin::UpdateSpecialBinariesFromPreloadedModules(
+ std::vector<std::pair<ImageInfo, ModuleSP>> &images) {
uint32_t exe_idx = UINT32_MAX;
uint32_t dyld_idx = UINT32_MAX;
Target &target = m_process->GetTarget();
@@ -526,35 +539,34 @@ void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
ConstString g_dyld_sim_filename("dyld_sim");
ArchSpec target_arch = target.GetArchitecture();
- const size_t image_infos_size = image_infos.size();
- for (size_t i = 0; i < image_infos_size; i++) {
- if (image_infos[i].header.filetype == llvm::MachO::MH_DYLINKER) {
+ const size_t images_size = images.size();
+ for (size_t i = 0; i < images_size; i++) {
+ const auto &image_info = images[i].first;
+ if (image_info.header.filetype == llvm::MachO::MH_DYLINKER) {
// In a "simulator" process we will have two dyld modules --
// a "dyld" that we want to keep track of, and a "dyld_sim" which
// we don't need to keep track of here. dyld_sim will have a non-macosx
// OS.
if (target_arch.GetTriple().getEnvironment() == llvm::Triple::Simulator &&
- image_infos[i].os_type != llvm::Triple::OSType::MacOSX) {
+ image_info.os_type != llvm::Triple::OSType::MacOSX) {
continue;
}
dyld_idx = i;
}
- if (image_infos[i].header.filetype == llvm::MachO::MH_EXECUTE) {
+ if (image_info.header.filetype == llvm::MachO::MH_EXECUTE) {
exe_idx = i;
}
}
// Set the target executable if we haven't found one so far.
if (exe_idx != UINT32_MAX && !target.GetExecutableModule()) {
- const bool can_create = true;
- ModuleSP exe_module_sp(FindTargetModuleForImageInfo(image_infos[exe_idx],
- can_create, nullptr));
+ ModuleSP exe_module_sp = images[exe_idx].second;
if (exe_module_sp) {
LLDB_LOGF(log, "Found executable module: %s",
exe_module_sp->GetFileSpec().GetPath().c_str());
target.GetImages().AppendIfNeeded(exe_module_sp);
- UpdateImageLoadAddress(exe_module_sp.get(), image_infos[exe_idx]);
+ UpdateImageLoadAddress(exe_module_sp.get(), images[exe_idx].first);
if (exe_module_sp.get() != target.GetExecutableModulePointer())
target.SetExecutableModule(exe_module_sp, eLoadDependentsNo);
@@ -581,14 +593,12 @@ void DynamicLoaderDarwin::UpdateSpecialBinariesFromNewImageInfos(
}
if (dyld_idx != UINT32_MAX) {
- const bool can_create = true;
- ModuleSP dyld_sp = FindTargetModuleForImageInfo(image_infos[dyld_idx],
- can_create, nullptr);
+ ModuleSP dyld_sp = images[dyld_idx].second;
if (dyld_sp.get()) {
LLDB_LOGF(log, "Found dyld module: %s",
dyld_sp->GetFileSpec().GetPath().c_str());
target.GetImages().AppendIfNeeded(dyld_sp);
- UpdateImageLoadAddress(dyld_sp.get(), image_infos[dyld_idx]);
+ UpdateImageLoadAddress(dyld_sp.get(), images[dyld_idx].first);
SetDYLDModule(dyld_sp);
}
}
@@ -642,26 +652,86 @@ ModuleSP DynamicLoaderDarwin::GetDYLDModule() {
void DynamicLoaderDarwin::ClearDYLDModule() { m_dyld_module_wp.reset(); }
+template <typename InputIterator, typename ResultType>
+static std::vector<ResultType> parallel_map(
+ llvm::ThreadPoolInterface &threadPool, InputIterator first,
+ InputIterator last,
+ llvm::function_ref<ResultType(
+ const typename std::iterator_traits<InputIterator>::value_type &)>
+ transform) {
+ const auto size = std::distance(first, last);
+ std::vector<ResultType> results(size);
+ if (size > 0) {
+ llvm::ThreadPoolTaskGroup taskGroup(threadPool);
+ auto it = first;
+ for (ssize_t i = 0; i < size; ++i, ++it) {
+ taskGroup.async([&, i, it]() { results[i] = transform(*it); });
+ }
+ taskGroup.wait();
+ }
+ return results;
+}
+
+template <typename InputIterator, typename ResultType>
+static std::vector<ResultType>
+map(InputIterator first, InputIterator last,
+ llvm::function_ref<ResultType(
+ const typename std::iterator_traits<InputIterator>::value_type &)>
+ transform) {
+ const auto size = std::distance(first, last);
+ std::vector<ResultType> results(size);
+ auto it = first;
+ for (ssize_t i = 0; i < size; ++i, ++it) {
+ results[i] = transform(*it);
+ }
+ return results;
+}
+
+std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>>
+DynamicLoaderDarwin::PreloadModulesFromImageInfos(
+ const ImageInfo::collection &image_infos) {
+ auto ImageLoad = [this](const ImageInfo &image_info) {
+ return std::make_pair(
+ image_info, FindTargetModuleForImageInfo(image_info, true, nullptr));
+ };
+ bool is_parallel_load =
+ DynamicLoaderDarwinProperties::GetGlobal().GetEnableParallelImageLoad();
+ auto images = is_parallel_load
+ ? parallel_map<ImageInfo::collection::const_iterator,
+ std::pair<ImageInfo, ModuleSP>>(
+ Debugger::GetThreadPool(), image_infos.begin(),
+ image_infos.end(), ImageLoad)
+ : map<ImageInfo::collection::const_iterator,
+ std::pair<ImageInfo, ModuleSP>>(
+ image_infos.begin(), image_infos.end(), ImageLoad);
+ return images;
+}
+
bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
ImageInfo::collection &image_infos) {
std::lock_guard<std::recursive_mutex> guard(m_mutex);
+ auto images = PreloadModulesFromImageInfos(image_infos);
+ return AddModulesUsingPreloadedModules(images);
+}
+
+bool DynamicLoaderDarwin::AddModulesUsingPreloadedModules(
+ std::vector<std::pair<ImageInfo, ModuleSP>> &images) {
+ std::lock_guard<std::recursive_mutex> guard(m_mutex);
// Now add these images to the main list.
ModuleList loaded_module_list;
Log *log = GetLog(LLDBLog::DynamicLoader);
Target &target = m_process->GetTarget();
ModuleList &target_images = target.GetImages();
- for (uint32_t idx = 0; idx < image_infos.size(); ++idx) {
+ for (uint32_t idx = 0; idx < images.size(); ++idx) {
+ auto &image_info = images[idx].first;
+ const auto &image_module_sp = images[idx].second;
if (log) {
LLDB_LOGF(log, "Adding new image at address=0x%16.16" PRIx64 ".",
- image_infos[idx].address);
- image_infos[idx].PutToLog(log);
+ image_info.address);
+ image_info.PutToLog(log);
}
-
- m_dyld_image_infos.push_back(image_infos[idx]);
-
- ModuleSP image_module_sp(
- FindTargetModuleForImageInfo(image_infos[idx], true, nullptr));
+ m_dyld_image_infos.push_back(image_info);
if (image_module_sp) {
ObjectFile *objfile = image_module_sp->GetObjectFile();
@@ -673,7 +743,7 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
sections->FindSectionByName(commpage_dbstr).get();
if (commpage_section) {
ModuleSpec module_spec(objfile->GetFileSpec(),
- image_infos[idx].GetArchitecture());
+ image_info.GetArchitecture());
module_spec.GetObjectName() = commpage_dbstr;
ModuleSP commpage_image_module_sp(
target_images.FindFirstModule(module_spec));
@@ -686,17 +756,17 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
if (!commpage_image_module_sp ||
commpage_image_module_sp->GetObjectFile() == nullptr) {
commpage_image_module_sp = m_process->ReadModuleFromMemory(
- image_infos[idx].file_spec, image_infos[idx].address);
+ image_info.file_spec, image_info.address);
// Always load a memory image right away in the target in case
// we end up trying to read the symbol table from memory... The
// __LINKEDIT will need to be mapped so we can figure out where
// the symbol table bits are...
bool changed = false;
UpdateImageLoadAddress(commpage_image_module_sp.get(),
- image_infos[idx]);
+ image_info);
target.GetImages().Append(commpage_image_module_sp);
if (changed) {
- image_infos[idx].load_stop_id = m_process->GetStopID();
+ image_info.load_stop_id = m_process->GetStopID();
loaded_module_list.AppendIfNeeded(commpage_image_module_sp);
}
}
@@ -709,14 +779,14 @@ bool DynamicLoaderDarwin::AddModulesUsingImageInfos(
// address. We need to check this so we don't mention that all loaded
// shared libraries are newly loaded each time we hit out dyld breakpoint
// since dyld will list all shared libraries each time.
- if (UpdateImageLoadAddress(image_module_sp.get(), image_infos[idx])) {
+ if (UpdateImageLoadAddress(image_module_sp.get(), image_info)) {
target_images.AppendIfNeeded(image_module_sp);
loaded_module_list.AppendIfNeeded(image_module_sp);
}
// To support macCatalyst and legacy iOS simulator,
// update the module's platform with the DYLD info.
- ArchSpec dyld_spec = image_infos[idx].GetArchitecture();
+ ArchSpec dyld_spec = image_info.GetArchitecture();
auto &dyld_triple = dyld_spec.GetTriple();
if ((dyld_triple.getEnvironment() == llvm::Triple::MacABI &&
dyld_triple.getOS() == llvm::Triple::IOS) ||
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
index 45c693163f8105..bc5464d76b9503 100644
--- a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.h
@@ -58,6 +58,8 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
std::optional<lldb_private::Address> GetStartAddress() override;
+ static void CreateSettings(lldb_private::Debugger &debugger);
+
protected:
void PrivateInitialize(lldb_private::Process *process);
@@ -174,7 +176,7 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
bool UnloadModuleSections(lldb_private::Module *module, ImageInfo &info);
- lldb::ModuleSP FindTargetModuleForImageInfo(ImageInfo &image_info,
+ lldb::ModuleSP FindTargetModuleForImageInfo(const ImageInfo &image_info,
bool can_create,
bool *did_create_ptr);
@@ -201,11 +203,18 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
lldb_private::StructuredData::ObjectSP image_details,
ImageInfo::collection &image_infos);
- // If image_infos contains / may contain dyld or executable image, call this
- // method
- // to keep our internal record keeping of the special binaries up-to-date.
- void
- UpdateSpecialBinariesFromNewImageInfos(ImageInfo::collection &image_infos);
+ // Finds/loads modules for a given `image_infos` and returns pairs
+ // (ImageInfo, ModuleSP).
+ // Prefer using this method rather than calling `FindTargetModuleForImageInfo`
+ // directly as this method may load the modules in parallel.
+ std::vector<std::pair<ImageInfo, lldb::ModuleSP>>
+ PreloadModulesFromImageInfos(const ImageInfo::collection &image_infos);
+
+ // If `images` contains / may contain dyld or executable image, call this
+ // method to keep our internal record keeping of the special binaries
+ // up-to-date.
+ void UpdateSpecialBinariesFromPreloadedModules(
+ std::vector<std::pair<ImageInfo, lldb::ModuleSP>> &images);
// if image_info is a dyld binary, call this method
bool UpdateDYLDImageInfoFromNewImageInfo(ImageInfo &image_info);
@@ -215,6 +224,8 @@ class DynamicLoaderDarwin : public lldb_private::DynamicLoader {
void AddExecutableModuleIfInImageInfos(ImageInfo::collection &image_infos);
bool AddModulesUsingImageInfos(ImageInfo::collection &image_infos);
+ bool AddModulesUsingPreloadedModules(
+ std::vector<std::pair<ImageInfo, lldb::ModuleSP>> &images);
// Whether we should use the new dyld SPI to get shared library information,
// or read
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp
new file mode 100644
index 00000000000000..f4d8a071e6d5de
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.cpp
@@ -0,0 +1,53 @@
+//===-- DynamicLoaderDarwinProperties.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "DynamicLoaderDarwinProperties.h"
+
+using namespace lldb_private;
+
+#define LLDB_PROPERTIES_dynamicloaderdarwin_experimental
+#include "DynamicLoaderDarwinProperties.inc"
+
+enum {
+#define LLDB_PROPERTIES_dynamicloaderdarwin_experimental
+#include "DynamicLoaderDarwinPropertiesEnum.inc"
+};
+
+llvm::StringRef DynamicLoaderDarwinProperties::GetSettingName() {
+ static constexpr llvm::StringLiteral g_setting_name("darwin");
+ return g_setting_name;
+}
+
+DynamicLoaderDarwinProperties::ExperimentalProperties::ExperimentalProperties()
+ : Properties(std::make_shared<OptionValueProperties>(
+ GetExperimentalSettingsName())) {
+ m_collection_sp->Initialize(g_dynamicloaderdarwin_experimental_properties);
+}
+
+DynamicLoaderDarwinProperties::DynamicLoaderDarwinProperties()
+ : Properties(std::make_shared<OptionValueProperties>(GetSettingName())),
+ m_experimental_properties(std::make_unique<ExperimentalProperties>()) {
+ m_collection_sp->AppendProperty(
+ Properties::GetExperimentalSettingsName(),
+ "Experimental settings - setting these won't produce errors if the "
+ "setting is not present.",
+ true, m_experimental_properties->GetValueProperties());
+}
+
+bool DynamicLoaderDarwinProperties::GetEnableParallelImageLoad() const {
+ return m_experimental_properties->GetPropertyAtIndexAs<bool>(
+ ePropertyEnableParallelImageLoad,
+ g_dynamicloaderdarwin_experimental_properties
+ [ePropertyEnableParallelImageLoad]
+ .default_uint_value != 0);
+}
+
+DynamicLoaderDarwinProperties &DynamicLoaderDarwinProperties::GetGlobal() {
+ static DynamicLoaderDarwinProperties g_settings;
+ return g_settings;
+}
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h
new file mode 100644
index 00000000000000..4c5e800c4f3e4c
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.h
@@ -0,0 +1,34 @@
+//===-- DynamicLoaderDarwinProperties.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
+#define LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
+
+#include "lldb/Core/UserSettingsController.h"
+
+namespace lldb_private {
+
+class DynamicLoaderDarwinProperties : public Properties {
+public:
+ class ExperimentalProperties : public Properties {
+ public:
+ ExperimentalProperties();
+ };
+ static llvm::StringRef GetSettingName();
+ static DynamicLoaderDarwinProperties &GetGlobal();
+ DynamicLoaderDarwinProperties();
+ ~DynamicLoaderDarwinProperties() override = default;
+ bool GetEnableParallelImageLoad() const;
+
+private:
+ std::unique_ptr<ExperimentalProperties> m_experimental_properties;
+};
+
+} // namespace lldb_private
+
+#endif // LLDB_SOURCE_PLUGINS_DYNAMICLOADER_MACOSX_DYLD_DYNAMICLOADERDARWINPROPERTIES_H
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td
new file mode 100644
index 00000000000000..c54580ce347296
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwinProperties.td
@@ -0,0 +1,8 @@
+include "../../../../include/lldb/Core/PropertiesBase.td"
+
+let Definition = "dynamicloaderdarwin_experimental" in {
+ def EnableParallelImageLoad: Property<"enable-parallel-image-load", "Boolean">,
+ Global,
+ DefaultTrue,
+ Desc<"Load images in parallel.">;
+}
diff --git a/lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderMacOS.cpp b/lldb/source/Plugins/Dy...
[truncated]
|
f6cffb4
to
c8f133b
Compare
@jasonmolenda All the tests are green but I don't have the merge button. Can you merge it, please? |
@DmT021 I will, can you edit the Description in this PR for the commit message that will be recorded? |
template <typename InputIterator, typename ResultType> | ||
static std::vector<ResultType> parallel_map( | ||
llvm::ThreadPoolInterface &threadPool, InputIterator first, | ||
InputIterator last, | ||
llvm::function_ref<ResultType( | ||
const typename std::iterator_traits<InputIterator>::value_type &)> | ||
transform) { | ||
const auto size = std::distance(first, last); | ||
std::vector<ResultType> results(size); | ||
if (size > 0) { | ||
llvm::ThreadPoolTaskGroup taskGroup(threadPool); | ||
auto it = first; | ||
for (ssize_t i = 0; i < size; ++i, ++it) { | ||
taskGroup.async([&, i, it]() { results[i] = transform(*it); }); | ||
} | ||
taskGroup.wait(); | ||
} | ||
return results; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need these map
helpers? I would imagine you could achieve the same thing with something like this:
TaskGroup task_group(...);
for (...)
if (is_parallel)
task_group.async(lambda, args);
else
lambda(args);
task_group.wait();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can but it may not be better. The implementations of map and parallel_map aren't important from the business logic perspective. The only important thing about PreloadModulesFromImageInfos is checking the setting.
Also, the code will be a bit more complicated than shown in the snippet you provided:
- we are creating a task group and waiting for it even when we are going to load the modules sequentially.
- we are checking the setting on each iteration of the loop.
This is probably almost free (I'm not sure if that's the case for task_group.wait()
though) but it's still a bit of an eyesore.
But a version of this without these two disadvantages will have a repeating for loop. Something like
std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>>
DynamicLoaderDarwin::PreloadModulesFromImageInfos(
const ImageInfo::collection &image_infos) {
const auto size = image_infos.size();
std::vector<std::pair<DynamicLoaderDarwin::ImageInfo, ModuleSP>> images(
size);
auto lambda = [&](size_t i, ImageInfo::collection::const_iterator it) {
const auto &image_info = *it;
images[i] = std::make_pair(
image_info, FindTargetModuleForImageInfo(image_info, true, nullptr));
};
auto it = image_infos.begin();
bool is_parallel_load =
DynamicLoaderDarwinProperties::GetGlobal().GetEnableParallelImageLoad();
if (is_parallel_load) {
if (size > 0) {
llvm::ThreadPoolTaskGroup taskGroup(Debugger::GetThreadPool());
for (size_t i = 0; i < size; ++i, ++it) {
taskGroup.async(lambda, i, it);
}
taskGroup.wait();
}
} else {
for (size_t i = 0; i < size; ++i, ++it) {
lambda(i, it);
}
}
return images;
}
So now we have a bigger room for error in future refactoring here and the important part (checking the setting) is less visible in the code as well.
Also, consider how would this code look like if had std::transform
with ExecutionPolicy
available. I think it'd look about the same as the current implementation with map
and parallel_map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think the new code, even with the little bit of code duplication, reads a lot easier than the templated signature of the (parallel_)map
functions. I personally wouldn't bother with the size > 0
check (I expect that creating an empty task group and waiting on an empty task group is essentially free). Anyway, I don't feel super strongly about it, so go with whatever you think is best.
FWIW I don't expect us to ever migrate this code to std::transform
because we definitely want to limit the parallelism by using the debugger's thread pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DmT021 I didn't want to merge the PR if you were still considering Jonas' opinion on this approach. Did you want to change the PR or are you fine with it as-is. Jonas says the choice is yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonmolenda @JDevlieghere I decided to inline the helpers. I still think we are mixing important and utilitary code here and it's less than ideal. But at the same time, I fully understand the readability issue. What's also important here the signatures of the map functions aren't ideal to be considered "general purpose helpers". Namely, they expect std::distance
to be O(1) which isn't always the case with the current template constraints. So I think if we are going to use the same technique in other places (e.g. Target::SetExecutableModule
and other dynamic loader plugins) - we may want to write better (more general) implementations for these helpers in some common place.
PS Sorry for the long answer, it took me a while to come up with the decision here.
Very nice! And now we'll see what all the CI bots think of this change. :) |
This change enables `DynamicLoaderDarwin` to load modules in parallel using the thread pool. This new behavior is controlled by a new setting `plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`, which is enabled by default. When disabled, DynamicLoaderDarwin will load modules sequentially as before.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks. (cherry picked from commit a57296a)
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Release note llvm#110646 and llvm#114507.
This patch improves LLDB launch time on Linux machines for **preload scenarios**, particularly for executables with a lot of shared library dependencies (or modules). Specifically: * Launching a binary with `target.preload-symbols = true` * Attaching to a process with `target.preload-symbols = true`. It's completely controlled by a new flag added in the first commit `plugin.dynamic-loader.posix-dyld.parallel-module-load`, which *defaults to false*. This was inspired by similar work on Darwin #110646. Some rough numbers to showcase perf improvement, run on a very beefy machine: * Executable with ~5600 modules: baseline 45s, improvement 15s * Executable with ~3800 modules: baseline 25s, improvement 10s * Executable with ~6650 modules: baseline 67s, improvement 20s * Executable with ~12500 modules: baseline 185s, improvement 85s * Executable with ~14700 modules: baseline 235s, improvement 120s A lot of targets we deal with have a *ton* of modules, and unfortunately we're unable to convince other folks to reduce the number of modules, so performance improvements like this can be very impactful for user experience. This patch achieves the performance improvement by parallelizing `DynamicLoaderPOSIXDYLD::RefreshModules` for the launch scenario, and `DynamicLoaderPOSIXDYLD::LoadAllCurrentModules` for the attach scenario. The commits have some context on their specific changes as well -- hopefully this helps the review. # More context on implementation We discovered the bottlenecks by via `perf record -g -p <lldb's pid>` on a Linux machine. With an executable known to have 1000s of shared library dependencies, I ran ``` (lldb) b main (lldb) r # taking a while ``` and showed the resulting perf trace (snippet shown) ``` Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812 Children Self Command Shared Object Symbol - 93.54% 0.00% intern-state libc.so.6 [.] clone3 clone3 start_thread lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) r std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&) lldb_private::Process::RunPrivateStateThread(bool) n - lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&) - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*) - lldb_private::Thread::ShouldStop(lldb_private::Event*) * - 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*) t - 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*) i lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*) k lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*) b lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long) i DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo - DynamicLoaderPOSIXDYLD::RefreshModules() O - 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&) const u - 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, bools - lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, boos - 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&) o - 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, lldb_private::Status* - 77.89% lldb_private::Module::PreloadSymbols() - 44.06% lldb_private::Symtab::PreloadSymbols() - 43.66% lldb_private::Symtab::InitNameIndexes() ... ``` We saw that majority of time was spent in `RefreshModules`, with the main culprit within it `LoadModuleAtAddress` which eventually calls `PreloadSymbols`. At first, `DynamicLoaderPOSIXDYLD::LoadModuleAtAddress` appears fairly independent -- most of it deals with different files and then getting or creating Modules from these files. The portions that aren't independent seem to deal with ModuleLists, which appear concurrency safe. There were members of `DynamicLoaderPOSIXDYLD` I had to synchronize though: namely `m_loaded_modules` which `DynamicLoaderPOSIXDYLD` maintains to map its loaded modules to their link addresses. Without synchronizing this, I ran into SEGFAULTS and other issues when running `check-lldb`. I also locked the assignment and comparison of `m_interpreter_module`, which may be unnecessary. # Alternate implementations When creating this patch, another implementation I considered was directly background-ing the call to `Module::PreloadSymbol` in `Target::GetOrCreateModule`. It would have the added benefit of working across platforms generically, and appeared to be concurrency safe. It was done via `Debugger::GetThreadPool().async` directly. However, there were a ton of concurrency issues, so I abandoned that approach for now. # Testing With the feature active, I tested via `ninja check-lldb` on both Debug and Release builds several times (~5 or 6 altogether?), and didn't spot additional failing or flaky tests. I also tested manually on several different binaries, some with around 14000 modules, but just basic operations: launching, reaching main, setting breakpoint, stepping, showing some backtraces. I've also tested with the flag off just to make sure things behave properly synchronously.
#134437) A requested follow-up from #130912 by @JDevlieghere to control Darwin parallel image loading with the same `target.parallel-module-load` that controls the POSIX dyld parallel image loading. Darwin parallel image loading was introduced by #110646. This small change: * removes `plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load` and associated code. * changes setting call site in `DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new setting. Tested by running `ninja check-lldb` and loading some targets. Co-authored-by: Tom Yang <[email protected]>
…-module-load (#134437) A requested follow-up from llvm/llvm-project#130912 by @JDevlieghere to control Darwin parallel image loading with the same `target.parallel-module-load` that controls the POSIX dyld parallel image loading. Darwin parallel image loading was introduced by llvm/llvm-project#110646. This small change: * removes `plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load` and associated code. * changes setting call site in `DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new setting. Tested by running `ninja check-lldb` and loading some targets. Co-authored-by: Tom Yang <[email protected]>
This change enables
DynamicLoaderDarwin
to load modules in parallel using the thread pool. This new behavior is controlled by a new settingplugin.dynamic-loader.darwin.experimental.enable-parallel-image-load
, which is enabled by default. When disabled, DynamicLoaderDarwin will load modules sequentially as before.