Skip to content

[lldb] Do not bump memory modificator ID when "internal" debugger memory is updated #129092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 1, 2025

Conversation

real-mikhail
Copy link
Contributor

@real-mikhail real-mikhail commented Feb 27, 2025

This change adds a setting target.process.track-memory-cache-changes.
Disabling this setting prevents invalidating and updating values in ValueObject::UpdateValueIfNeeded when only "internal" debugger memory is updated. Writing to "internal" debugger memory happens when, for instance, expressions are evaluated by visualizers (pretty printers).
One of the examples when cache invalidation has a particularly heavy impact is visualizations of some collections: in some collections getting collection size is an expensive operation (it requires traversal of the collection).
At the same time evaluating user expression with side effects (visible to target, not only to debugger) will still bump memory ID because:

  • If expression is evaluated via interpreter: it will cause write to "non-internal" memory
  • If expression is JIT-compiled: then to call the function LLDB will write to "non-internal" stack memory

The downside of disabled target.process.track-memory-cache-changes setting is that convenience variables won't reevaluate synthetic children automatically.

…ory is updated

This change prevents invalidating and updating values in `ValueObject::UpdateValueIfNeeded` when only "internal" debugger memory is updated.
Writing to "internal" debugger memory happens when, for instance, user expressions are evaluated by pretty printers.
For some collections getting collection size is an expensive operation (it requires traversal of the collection) and broken value caching (being invalidated after executing expressions) make things worse.
At the same time evaluating user expression with side effects (visible to target, not only to debugger) will still bump memory ID because:
1. If expression is evaluated via interpreter: it will cause write to "non-internal" memory
2. If expression is JIT-compiled: then to call the function LLDB will write to "non-internal" stack memory
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Feb 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 27, 2025

@llvm/pr-subscribers-lldb

Author: Mikhail Zakharov (real-mikhail)

Changes

This change prevents invalidating and updating values in ValueObject::UpdateValueIfNeeded when only "internal" debugger memory is updated. Writing to "internal" debugger memory happens when, for instance, expressions are evaluated by visualizers (pretty printers).
One of the examples when broken caching has a particularly heavy impact is visualizations of some collections: in some collections getting collection size is an expensive operation (it requires traversal of the collection). But the change fixes the general issue when modifying debugger-only-visible memory invalidates caching of all variables.
At the same time evaluating user expression with side effects (visible to target, not only to debugger) will still bump memory ID because:

  1. If expression is evaluated via interpreter: it will cause write to "non-internal" memory
  2. If expression is JIT-compiled: then to call the function LLDB will write to "non-internal" stack memory

Full diff: https://github.com/llvm/llvm-project/pull/129092.diff

4 Files Affected:

  • (modified) lldb/include/lldb/Target/Memory.h (+3-1)
  • (modified) lldb/source/Target/Memory.cpp (+8)
  • (modified) lldb/source/Target/Process.cpp (+6-1)
  • (added) lldb/test/Shell/Expr/TestExprWithSideEffect.cpp (+82)
diff --git a/lldb/include/lldb/Target/Memory.h b/lldb/include/lldb/Target/Memory.h
index 2d1489a2c96ea..864ef6ca00802 100644
--- a/lldb/include/lldb/Target/Memory.h
+++ b/lldb/include/lldb/Target/Memory.h
@@ -125,6 +125,8 @@ class AllocatedMemoryCache {
 
   bool DeallocateMemory(lldb::addr_t ptr);
 
+  bool IsInCache(lldb::addr_t addr) const;
+
 protected:
   typedef std::shared_ptr<AllocatedBlock> AllocatedBlockSP;
 
@@ -133,7 +135,7 @@ class AllocatedMemoryCache {
 
   // Classes that inherit from MemoryCache can see and modify these
   Process &m_process;
-  std::recursive_mutex m_mutex;
+  mutable std::recursive_mutex m_mutex;
   typedef std::multimap<uint32_t, AllocatedBlockSP> PermissionsToBlockMap;
   PermissionsToBlockMap m_memory_map;
 
diff --git a/lldb/source/Target/Memory.cpp b/lldb/source/Target/Memory.cpp
index 5cdd84f6640f0..9bfb8c349aa2c 100644
--- a/lldb/source/Target/Memory.cpp
+++ b/lldb/source/Target/Memory.cpp
@@ -433,3 +433,11 @@ bool AllocatedMemoryCache::DeallocateMemory(lldb::addr_t addr) {
             (uint64_t)addr, success);
   return success;
 }
+
+bool AllocatedMemoryCache::IsInCache(lldb::addr_t addr) const {
+  std::lock_guard guard(m_mutex);
+
+  return llvm::any_of(m_memory_map, [addr](const auto &block) {
+    return block.second->Contains(addr);
+  });
+}
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 6db582096155f..1150fabf2d0ed 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -2267,6 +2267,7 @@ size_t Process::WriteMemoryPrivate(addr_t addr, const void *buf, size_t size,
   return bytes_written;
 }
 
+#define USE_ALLOCATE_MEMORY_CACHE 1
 size_t Process::WriteMemory(addr_t addr, const void *buf, size_t size,
                             Status &error) {
   if (ABISP abi_sp = GetABI())
@@ -2279,7 +2280,12 @@ size_t Process::WriteMemory(addr_t addr, const void *buf, size_t size,
   if (buf == nullptr || size == 0)
     return 0;
 
+#if defined(USE_ALLOCATE_MEMORY_CACHE)
+  if (!m_allocated_memory_cache.IsInCache(addr))
+    m_mod_id.BumpMemoryID();
+#else
   m_mod_id.BumpMemoryID();
+#endif
 
   // We need to write any data that would go where any current software traps
   // (enabled software breakpoints) any software traps (breakpoints) that we
@@ -2408,7 +2414,6 @@ Status Process::WriteObjectFile(std::vector<ObjectFile::LoadableData> entries) {
   return error;
 }
 
-#define USE_ALLOCATE_MEMORY_CACHE 1
 addr_t Process::AllocateMemory(size_t size, uint32_t permissions,
                                Status &error) {
   if (GetPrivateState() != eStateStopped) {
diff --git a/lldb/test/Shell/Expr/TestExprWithSideEffect.cpp b/lldb/test/Shell/Expr/TestExprWithSideEffect.cpp
new file mode 100644
index 0000000000000..d072fe3121031
--- /dev/null
+++ b/lldb/test/Shell/Expr/TestExprWithSideEffect.cpp
@@ -0,0 +1,82 @@
+// Tests evaluating expressions with side effects.
+// Applied side effect should be visible to the debugger.
+
+// RUN: %build %s -o %t
+// RUN: %lldb %t -o run \
+// RUN:   -o "frame variable x" \
+// RUN:   -o "expr x.inc()" \
+// RUN:   -o "frame variable x" \
+// RUN:   -o "continue" \
+// RUN:   -o "frame variable x" \
+// RUN:   -o "expr x.i = 10" \
+// RUN:   -o "frame variable x" \
+// RUN:   -o "continue" \
+// RUN:   -o "frame variable x" \
+// RUN:   -o "expr int $y = 11" \
+// RUN:   -o "expr $y" \
+// RUN:   -o "expr $y = 100" \
+// RUN:   -o "expr $y" \
+// RUN:   -o "continue" \
+// RUN:   -o "expr $y" \
+// RUN:   -o exit | FileCheck %s -dump-input=fail
+
+class X
+{
+  int i = 0;
+public:
+  void inc()
+  {
+    ++i;
+  }
+};
+
+int main()
+{
+  X x;
+  x.inc();
+
+  __builtin_debugtrap();
+  __builtin_debugtrap();
+  __builtin_debugtrap();
+  __builtin_debugtrap();
+  return 0;
+}
+
+// CHECK-LABEL: frame variable x
+// CHECK: (X) x = (i = 1)
+
+// CHECK-LABEL: expr x.inc()
+// CHECK-LABEL: frame variable x
+// CHECK: (X) x = (i = 2)
+
+// CHECK-LABEL: continue
+// CHECK-LABEL: frame variable x
+// CHECK: (X) x = (i = 2)
+
+
+
+// CHECK-LABEL: expr x.i = 10
+// CHECK: (int) $0 = 10
+
+// CHECK-LABEL: frame variable x
+// CHECK: (X) x = (i = 10)
+
+// CHECK-LABEL: continue
+// CHECK-LABEL: frame variable x
+// CHECK: (X) x = (i = 10)
+
+
+
+// CHECK-LABEL: expr int $y = 11
+// CHECK-LABEL: expr $y
+// CHECK: (int) $y = 11
+
+// CHECK-LABEL: expr $y = 100
+// CHECK: (int) $1 = 100
+
+// CHECK-LABEL: expr $y
+// CHECK: (int) $y = 100
+
+// CHECK-LABEL: continue
+// CHECK-LABEL: expr $y
+// CHECK: (int) $y = 100

@jimingham
Copy link
Collaborator

When we define a variable in the expression parser, like:

(lldb) expr SomeStruct $mine = {10, 20, 30}

we allocate memory in the target (in the allocated memory cache.) When we go to update that variable in the expression parser, we need to write a new value into the allocated memory. Does that process still work with your change such that we see the correct new value for the variable?

Copy link

github-actions bot commented Mar 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@real-mikhail
Copy link
Contributor Author

real-mikhail commented Mar 14, 2025

I checked that and added a test for this case. Updating value works and new value can be displayed (in simple case):

(lldb) expr A $mine = {100, 200}
(lldb) expr $mine.a = 300
(int) $0 = 300
(lldb) expr $mine
(A) $mine = {a=300, b=200} {
  a = 300
  b = 200
}

But it is true that with my change memoryId will not be bumped. That means that lldb_private::ValueObject::EvaluationPoint::NeedsUpdating will return false. And in case if expression execution heavily modified something in convenience variable (for instance it led to changing display format or it requires reconstructing synthetic children of that variable) - then this update will not happen and stale value will be shown. I'm not sure though if this is a realistic case.

P.S. Also I see that there is an issue on Linux. Looks like some issue with $var syntax. I'll check why it doesn't work there - just need to get a Linux machine (I'm mostly work on Windows).

@jimingham
Copy link
Collaborator

jimingham commented Mar 14, 2025

If I make an instance of a collection class (most of which are comprehended by synthetic child providers) and then call:

(lldb) expr $my_vector.append(1)
(lldb) expr $my_vector

Then if I understand correctly, the second printing will not show the elements correctly. Our support for making std::vector objects in the expression parser is weak right now, but our plan is to make this work. And you certainly could have the same problem with a non-templated class that wrapped one of these collection classes, since its constructor would work around the problem of instantiating the template.

That doesn't seem to me like such a far-fetched example.

@jimingham
Copy link
Collaborator

If this is only a problem for persistent variables, maybe we can somehow remember that we delayed an update of the memory generation and if we see that's true and we're updating a persistent variable, then clear the memory cache at that point?

@real-mikhail
Copy link
Contributor Author

real-mikhail commented Mar 15, 2025

If I make an instance of a collection class (most of which are comprehended by synthetic child providers) and then call:
(lldb) expr $my_vector.append(1)
(lldb) expr $my_vector

That will still work with my changes. Because function call (.append()) is JIT compiled and (with current implementation) LLDB modifies stack memory to call such function. Approximate callstack:

[liblldb.dll] lldb_private::Process::WriteMemory(unsigned long long, const void *, unsigned long long, lldb_private::Status &) Process.cpp:2240
[liblldb.dll] lldb_private::Process::WriteScalarToMemory(unsigned long long, const lldb_private::Scalar &, unsigned long long, lldb_private::Status &) Process.cpp:2319
[liblldb.dll] lldb_private::Process::WritePointerToMemory(unsigned long long, unsigned long long, lldb_private::Status &) Process.cpp:2208
/* Writing return address on the stack here */ [liblldb.dll] ABIWindows_x86_64::PrepareTrivialCall(lldb_private::Thread &,unsigned long long,unsigned long long,unsigned long long,ArrayRef<unsigned __int64>) ABIWindows_x86_64.cpp:1163
[liblldb.dll] lldb_private::ThreadPlanCallFunction::ThreadPlanCallFunction(lldb_private::Thread &,const lldb_private::Address &,const lldb_private::CompilerType &,ArrayRef<unsigned __int64>,const lldb_private::EvaluateExpressionOptions &) ThreadPlanCallFunction.cpp:143
[liblldb.dll] lldb_private::ThreadPlanCallUserExpression::ThreadPlanCallUserExpression(lldb_private::Thread &,lldb_private::Address &,ArrayRef<unsigned __int64>,const lldb_private::EvaluateExpressionOptions &,std::shared_ptr<lldb_private::UserExpression> &) ThreadPlanCallUserExpression.cpp:38
[liblldb.dll] lldb_private::LLVMUserExpression::DoExecute(lldb_private::DiagnosticManager &, lldb_private::ExecutionContext &, lldb_private::ValueObject *, const lldb_private::EvaluateExpressionOptions &, std::shared_ptr<…> &, std::shared_ptr<…> &) LLVMUserExpression.cpp:158
[liblldb.dll] lldb_private::UserExpression::Execute(lldb_private::DiagnosticManager &, lldb_private::ExecutionContext &, lldb_private::ValueObject *, const lldb_private::EvaluateExpressionOptions &, std::shared_ptr<…> &, std::shared_ptr<…> &) UserExpression.cpp:550
[liblldb.dll] lldb_private::ExpressionCache::Execute(lldb_private::ExecutionContext &, lldb_private::ValueObject *, shared_ptr<…>, const lldb_private::EvaluateExpressionOptions &, lldb_private::SharingPtr<…> &) ExpressionCache.cpp:320
[liblldb.dll] lldb_private::ExpressionCache::EvaluateWithCache(lldb_private::ExecutionContext &, const lldb_private::EvaluateExpressionOptions &, StringRef, lldb_private::SharingPtr<…> &, lldb_private::ValueObject *, const std::shared_ptr<…> &, lldb_private::ExpressionCache::ExpressionKey &&) ExpressionCache.cpp:412
[liblldb.dll] lldb_private::ExpressionCache::Evaluate(lldb_private::ExecutionContext &, const lldb_private::EvaluateExpressionOptions &, StringRef, lldb_private::SharingPtr<…> &, std::string *, lldb_private::ValueObject *, shared_ptr<…>) ExpressionCache.cpp:208
[liblldb.dll] lldb_private::Target::EvaluateExpression(StringRef, lldb_private::ExecutionContextScope *, lldb_private::SharingPtr<…> &, const lldb_private::EvaluateExpressionOptions &, std::string *, lldb_private::ValueObject *) Target.cpp:2471

(this is from older LLDB version so line numbers might be wrong)

The case where it won't work is something like variant class. For instance, if there is a struct with 2 fields and first field decodes how to display the second field, and there is a pretty printer which creates a synthetic child for second field, and in expr user directly assigns some value to one of the fields (and not calls a property to set it) - then LLDB won't show correct values.

If this is only a problem for persistent variables, maybe we can somehow remember that we delayed an update of the memory generation and if we see that's true and we're updating a persistent variable, then clear the memory cache at that point?

In general case persistent variables can holds pointers to non-persistent ones and vice versa. I guess we can have two MemoryId variables (second one for memory cache) and if we can easily understand whether ValueObject is a persistent variable or not, then we can check whether variable needs updating by checking corresponding MemoryId. I think something like that might work.

@real-mikhail
Copy link
Contributor Author

I think something like that might work.

Clarification: that will work only for cases if there are VariableObject for each variable and they are checked. If, for instance, non-persistent variable holds reference to persistent variable and only MemoryCacheId is bumped then (pretty printer of) non-persistent variable will not know that it needs updating.

@jimingham
Copy link
Collaborator

jimingham commented Mar 19, 2025

I think something like that might work.

Clarification: that will work only for cases if there are VariableObject for each variable and they are checked. If, for instance, non-persistent variable holds reference to persistent variable and only MemoryCacheId is bumped then (pretty printer of) non-persistent variable will not know that it needs updating.

Making an object in the expression parser so that you can assign it to a member of some extant object is a useful way to test "what happens if an object made this way is put here". That's still a pretty advanced use, so it might not be too bad if you had to set a "target.process.expressions-flush-memory-cache" setting for the times we get this wrong.

It also seems like you are also solving a performance problem that most people don't have - we try hard not to use expressions in our data formatters, so I've never run into this issue.

So having a way to indicate you need the faster vrs. the more bullet-proof behavior might be an acceptable solution. You might even argue that the setting should be "bullet-proof behavior" by default. If you are shipping a data formatter that benefits from this setting, you can also set the setting in the module that installs the data formatters...

@labath
Copy link
Collaborator

labath commented Mar 20, 2025

So having a way to indicate you need the faster vrs. the more bullet-proof behavior might be an acceptable solution. You might even argue that the setting should be "bullet-proof behavior" by default. If you are shipping a data formatter that benefits from this setting, you can also set the setting in the module that installs the data formatters...

Or make it an expression flag that can be set (in SBExpressionOptions) on expressions that you know are safe.

(But generally yes, it's better (and faster) to avoid expression evaluation altogether. I don't know how many data formatters are you dealing with, but I would recommend changing them to not do that.)

@jimingham
Copy link
Collaborator

jimingham commented Mar 20, 2025 via email

@real-mikhail
Copy link
Contributor Author

Thanks folks. I agree that approach with the option is the best (will see what looks better, something like target.process.expressions-flush-memory-cache or flag in SBExpressionOptions). Since this flag will affect very low-level code maybe global flag (in target.process) will be easier to pass.
And yes, this flag will be disabled by default.

@real-mikhail
Copy link
Contributor Author

Hi,

I added the new option key target.process.track-memory-cache-changes (which defaults to true, meaning bulletproof behaviour). Setting it to false enables the optimization I initially suggested. I also added a new dump command and ensured with test that MemoryID is not changed when executing simple expressions.

@jimingham
Copy link
Collaborator

jimingham commented Apr 11, 2025

This looks pretty good to me. I had one trivial doc comment, and I think it's better not to start sprinkling internal debugging commands into the lldb command set. The debugger has enough commands already, I like to avoid adding ones that nobody but us will ever use.
If we start needing to do this a lot, then we should probably follow gdb and add a "maintenance" top-level command to gather them all up. That way normal user would just have to overlook one top-level command. But in this case, we already have process status so the debugging dump would fit naturally as an option to that command.

@real-mikhail real-mikhail requested a review from jimingham April 12, 2025 17:55
Copy link
Collaborator

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@real-mikhail
Copy link
Contributor Author

Hi @JDevlieghere

May I ask you to review the changes as well? That's my first PR in LLVM but I guess I need approvals from all reviewers before merge.

@DavidSpickett
Copy link
Collaborator

Jonas is added as he's listed as the code owner for this area, but does not necessarily review all PRs.

In general, you only need one approval - https://llvm.org/docs/CodeReview.html#lgtm-how-a-patch-is-accepted

Unless you happen to know or suspect others would want to comment, but this is easy to get wrong so don't worry too much about that. Fine to ask, folks can always say they have no opinion.

I assume you don't have the ability to merge this PR yourself? I expect not given it's your first PR here.

Since you already asked @JDevlieghere, they might as well glance at it but I expect Jim's review is enough and Jonas can just merge this.

@real-mikhail
Copy link
Contributor Author

Oh, sorry about unnecessary ping then and thanks for clarification and the link!
Yes, I cannot merge changes myself since this is my first PR.

@real-mikhail
Copy link
Contributor Author

I assume that there will be no further comments here.

Since this is my first change in LLVM I cannot merge it myself. @jimingham may I kindly ask you to merge this PR?

@jimingham jimingham merged commit 6aa963f into llvm:main May 1, 2025
10 checks passed
Copy link

github-actions bot commented May 1, 2025

@real-mikhail Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@jimingham
Copy link
Collaborator

Done. Thanks for working on this.

@kazutakahirata
Copy link
Contributor

@real-mikhail @jimingham I've landed 2bff80f to fix a warning from this PR. Thanks!

DavidSpickett added a commit that referenced this pull request May 2, 2025
…gger memory is updated (#129092)"

And a follow up warning fix.

This reverts commit 6aa963f
and 2bff80f.

This is failing on Windows on Arm: https://lab.llvm.org/buildbot/#/builders/141/builds/8375

Seems to produce the line the test wants but not in the right place.
Reverting while I investigate.
@DavidSpickett
Copy link
Collaborator

I've reverted this because it's failing on Windows on Arm.

It looks like all the IDs are there but not in the right place, so if we're lucky it's just some extra line that only appears on Windows on Arm. I'm looking into it now.

DavidSpickett added a commit that referenced this pull request May 2, 2025
…gger memory is updated (#129092)"

This reverts commit daa4061.

Original PR #129092.

I have restricted the test to X86 Windows because it turns out the only
reason that `expr x.get()` would change m_memory_id is that on x86 we
have to write the return address to the stack in ABIWindows_X86_64::PrepareTrivialCall:
```
  // Save return address onto the stack
  if (!process_sp->WritePointerToMemory(sp, return_addr, error))
    return false;
```

This is not required on AArch64 so m_memory_id was not changed:
```
(lldb) expr x.get()
(int) $0 = 0
(lldb) process status -d
Process 15316 stopped
* thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff764a31034
    frame #0: 0x00007ff764a31038 TestProcessModificationIdOnExpr.cpp.tmp`main at TestProcessModificationIdOnExpr.cpp:35
   32     __builtin_debugtrap();
   33     __builtin_debugtrap();
   34     return 0;
-> 35   }
   36
   37   // CHECK-LABEL: process status -d
   38   // CHECK: m_stop_id: 2
ProcessModID:
  m_stop_id: 3
  m_last_natural_stop_id: 0
  m_resume_id: 0
  m_memory_id: 0
```

Really we should find a better way to force a memory write here, but
I can't think of one right now.
@DavidSpickett
Copy link
Collaborator

DavidSpickett commented May 2, 2025

It was an ABI difference difference between X86 and AArch64. I've relanded this PR, the warning fix, and test fix.

If you can think of a way to force a memory write without relying on the return address, it would be nice to get this test running on AArch64 again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants