[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231

SLTozer · 2024-10-30T13:41:55Z

Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution.

Found using #107279.

…new insts Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution.

llvmbot · 2024-10-30T13:42:42Z

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-transforms

Author: Stephen Tozer (SLTozer)

Changes

Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution.

Full diff: https://github.com/llvm/llvm-project/pull/114231.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+3-1)
(added) llvm/test/Transforms/InstCombine/debugloc-bswap.ll (+52)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 50d1c61c24cf47..e2eae7fb8327cc 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2879,8 +2879,10 @@ Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I,
   Instruction *LastInst = Insts.pop_back_val();
   LastInst->removeFromParent();
 
-  for (auto *Inst : Insts)
+  for (auto *Inst : Insts) {
+    Inst->setDebugLoc(I.getDebugLoc());
     Worklist.push(Inst);
+  }
   return LastInst;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/debugloc-bswap.ll b/llvm/test/Transforms/InstCombine/debugloc-bswap.ll
new file mode 100644
index 00000000000000..6ac82878f9c201
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/debugloc-bswap.ll
@@ -0,0 +1,52 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=instcombine -S %s -o - | FileCheck %s
+;; Tests that when InstCombine replaces a bswap idiom with an intrinsic, the
+;; !dbg attachment from the replaced instruction is applied to all generated
+;; instructions, not just the last.
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @inflate(ptr %strm) {
+; CHECK-LABEL: define i32 @inflate(
+; CHECK-SAME: ptr [[STRM:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr [[STRM]], align 8
+; CHECK-NEXT:    [[TRUNC:%.*]] = trunc i64 [[TMP0]] to i32, !dbg [[DBG3:![0-9]+]]
+; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TRUNC]], 65535, !dbg [[DBG3]]
+; CHECK-NEXT:    [[MASK:%.*]] = call i32 @llvm.bswap.i32(i32 [[TMP1]]), !dbg [[DBG3]]
+; CHECK-NEXT:    [[ADD665:%.*]] = zext i32 [[MASK]] to i64, !dbg [[DBG3]]
+; CHECK-NEXT:    store i64 [[ADD665]], ptr [[STRM]], align 8
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  %0 = load i64, ptr %strm, align 8
+  %and660 = and i64 %0, 65280
+  %shl661 = shl i64 %and660, 8
+  %and663 = and i64 %0, 255
+  %shl664 = shl i64 %and663, 24
+  %add665 = or i64 %shl661, %shl664, !dbg !4
+  store i64 %add665, ptr %strm, align 8
+  ret i32 0
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 20.0.0git")
+!1 = !DIFile(filename: "/tmp/zlib_inflate.c", directory: "/tmp")
+!2 = !{}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !DILocation(line: 8, column: 42, scope: !6)
+!5 = !DIFile(filename: "zlib_inflate.c", directory: "/tmp")
+!6 = distinct !DISubprogram(name: "inflate", scope: !5, file: !5, line: 622, type: !7, scopeLine: 625, unit: !0, retainedNodes: !2)
+!7 = distinct !DISubroutineType(types: !2)
+;.
+; CHECK: [[META0:![0-9]+]] = distinct !DICompileUnit(language: DW_LANG_C11, file: [[META1:![0-9]+]], producer: "{{.*}}clang version {{.*}}", isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug)
+; CHECK: [[META1]] = !DIFile(filename: "/tmp/zlib_inflate.c", directory: {{.*}})
+; CHECK: [[DBG3]] = !DILocation(line: 8, column: 42, scope: [[META4:![0-9]+]])
+; CHECK: [[META4]] = distinct !DISubprogram(name: "inflate", scope: [[META5:![0-9]+]], file: [[META5]], line: 622, type: [[META6:![0-9]+]], scopeLine: 625, spFlags: DISPFlagDefinition, unit: [[META0]], retainedNodes: [[META7:![0-9]+]])
+; CHECK: [[META5]] = !DIFile(filename: "zlib_inflate.c", directory: {{.*}})
+; CHECK: [[META6]] = distinct !DISubroutineType(types: [[META7]])
+; CHECK: [[META7]] = !{}
+;.

dwblaikie · 2024-10-30T18:04:39Z

Should/does this use a merged location? (the test and patch description mention several instructions being replaced by several other instructions - so sounds like the new instructions should get the merged location of all the old instructions? But not clear if that's what's happening, or just the last of the old instructions is used as the location for the new ones?)

SLTozer · 2024-10-30T18:21:42Z

Should/does this use a merged location? (the test and patch description mention several instructions being replaced by several other instructions - so sounds like the new instructions should get the merged location of all the old instructions? But not clear if that's what's happening, or just the last of the old instructions is used as the location for the new ones?)

That's a good question, and one that applies to some other patches I've worked/am working on. I don't think we have a well-defined approach to handling locations in optimizations that rewrite multiple instructions into a set of new instructions that compute the same end result. Merging all the locations and applying the merged location to each new instruction is one option; it feels like a truthful representation, but is also slightly disappointing in that in many cases it means setting a line 0 location.

I think there's a reasonable argument for using the location of the final instruction(s) for all new generated instructions, because all the new instructions exist to compute the value previously computed by that final instruction(s), and they otherwise have no exact relationship to any single instruction in the original program. This is what I lean more towards right now, as I think it's as valid in principle as the merged location and more pragmatically useful, but that's not a strong position - very much interested in hearing other opinions on this, and possibly encoding them for posterity in the "How To Update Debug Info" doc.

dwblaikie · 2024-10-30T21:35:27Z

I don't think we have a well-defined approach to handling locations in optimizations that rewrite multiple instructions into a set of new instructions that compute the same end result. Merging all the locations and applying the merged location to each new instruction is one option; it feels like a truthful representation, but is also slightly disappointing in that in many cases it means setting a line 0 location.

Agreed it's disappointing - but I'm not sure how to argue against it - if we considered the case where multiple instructions were replaced with one instruction, I think our documented process is pretty clear that we should merge locations there, right? So why would it be so different if the result is multiple instructions?

We could add some special case to merging - if all the instructions to be merged are in the same basic block, perhaps we could say in that case we pick the last instruction if the merge would be worse in some way? (I guess at that point the rule is just "pick the last") - the loss there is we lose other source locations to break on. (though I guess that's true if we drop the location too)

SLTozer · 2024-11-04T13:22:15Z

if we considered the case where multiple instructions were replaced with one instruction, I think our documented process is pretty clear that we should merge locations there, right?

Hm, you're right that this is the documented rule - I went into this thinking this case was more ambiguous, because in InstCombine that rule isn't always strictly followed: we perform peephole optimizations that may replace multiple instructions with a single instruction that uses the location of the final replaced instruction.

For this patch I'm happy to switch back to use merged locations to keep it consistent with the existing rules, but I think the rule should be different: my short proposal would be that when replacing a sequence of instructions, where there are no side-effects and only one instruction's output is being replaced by the new instruction(s), we should use the output instruction's location for the instruction(s) inserted to replace it. That's a discussion that should happen elsewhere (either an RFC or another review), but I think it's probably worth having. In the meanwhile, I'll update this patch!

nikic · 2024-11-04T13:31:04Z

FWIW, I think that trying to use a merged location rather than the location of the last instruction in InstCombine would be completely impractical. Taking the last instruction can be done automatically, while a merged location would require each individual fold to track which instructions "contribute" towards it (a notion that is not going to be particularly well-defined for many folds anyway, esp. in multi-use scenarios where some of the old instructions are also retained).

SLTozer · 2024-11-04T14:08:19Z

FWIW, I think that trying to use a merged location rather than the location of the last instruction in InstCombine would be completely impractical.

I mostly agree, notably with this patch we will use merged locations for all the new instructions generated except for the final instruction that we RAUW the original instruction with, which will take the original instruction's location as always - so this patch will only apply to the new instructions generated in the course of creating the replacement instruction. But as we move towards avoiding all unnecessary drops of DILocations, cases like this where specific folds generate new instructions will need to manually set DILocations on those new instructions if we don't have an automatic way of doing so.

dwblaikie · 2024-11-04T19:34:19Z

In terms of a policy change - I could see an argument for "we didn't merge these instructions, we replaced the last instruction with some other instruction and optimized the rest of the instructions away" - though how do we differentiate these cases, does this apply to every "merged" instruction?

Oh, I guess there's a real difference here - merged instructions that come from things like hoisting and lowering (we took two equivalent instructions in different branches and effectively deduplicated them at some common point before or after) versus we took a series of instructions and collapsed them into one.

Perhaps we could say in the latter case, we always use the location of the final instruction in the original sequence? (do we define fallbacks? if that final instruction doesn't have a location we should use one of the others? If they're in the same basic block? (though sounds like you folks would be pushing back on implementing such a fallback here as tracking the set of locations that flow into this transformation is too expensive?))

SLTozer · 2024-11-04T23:56:25Z

Right, I think a slightly more detailed form of the policy I'd suggest would be that every original instruction whose outputs/side-effects are produced by the new instructions is interesting, and the location for the new instructions should be the merge of all interesting instructions. So for example,

In most peephole optimizations, we would only use the final instruction's location, because that is the only instruction that the new instructions are reproducing.
When we hoist and merge equivalent instructions from both sides of a branch, the new instruction is reproducing both of the original instructions, so we'd use a merged location.
If we merged two adds into a single vector add, the vector add would use a merged location because it's defining the combined value of the original adds.

I'm not sure about using other instructions as fallbacks if the sole interesting instruction has a missing location; it seems like it might help salvage some information where we would otherwise have none, I'd just want to be sure there aren't cases where it would be misleading.

dwblaikie · 2024-11-05T19:20:31Z

Right, I think a slightly more detailed form of the policy I'd suggest would be that every original instruction whose outputs/side-effects are produced by the new instructions is interesting, and the location for the new instructions should be the merge of all interesting instructions. So for example,

In most peephole optimizations, we would only use the final instruction's location, because that is the only instruction that the new instructions are reproducing.

When we hoist and merge equivalent instructions from both sides of a branch, the new instruction is reproducing both of the original instructions, so we'd use a merged location.

If we merged two adds into a single vector add, the vector add would use a merged location because it's defining the combined value of the original adds.

I'm not sure about using other instructions as fallbacks if the sole interesting instruction has a missing location; it seems like it might help salvage some information where we would otherwise have none, I'd just want to be sure there aren't cases where it would be misleading.

Sounds good to me. (re: merging two adds - for a sec I was confused, I thought you were describing (a + (b + c)) - in which case it sounded more like the peephole description, but you meant merging ((a1 + b1), (a2 + b2)) into ((a1, a2) + (b1, b2)) ? Yeah, that looks like the function call merging, the instruction is producing the results of both the original instructions)

adrian-prantl · 2024-11-07T16:30:16Z

@SLTozer Can you propose a patch to https://llvm.org/docs/HowToUpdateDebugInfo.html#when-to-merge-instruction-locations?

Following discussions on PR llvm#114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.

dwblaikie · 2024-11-07T17:54:05Z

Perhaps the simplest way to think of it is that replacing, say "add(a, add(b, c))" with "add(a, b, c)" isn't /replacing/ the second add, so it's not merging two adds, it's only replacing the outer add - which has the effect of adding a, b, and c (initially by relying on the result of the add(b, c), but then without relying on that anymore - the add(b, c) might still continue to exist, be used by other things, etc.

So it might be more consistent with our existing guidance than it first apperas - still good to write it up, but that might be a way to describe it that makes the unifying principles simpler/clearer as to why this conclusion applies.

SLTozer · 2024-11-07T18:59:20Z

That's how I'm thinking about it, I just think the term "replace" is a bit tricky since by most appearances, the intermediate instructions are being replaced by the folded instruction. Also, the existing policy explicitly states that mul+add=>fma should use a merged location, so changing that at least is a change in policy.

SLTozer · 2024-11-08T19:28:31Z

The policy change patch #115349 hasn't landed yet, but the principle seems to be approved of - in which case, is this patch approved to land?

dwblaikie · 2024-11-08T22:20:03Z

My main concern is addressed.

Looking at the code though, I'm a little confused - one instruction, I is replaced with some sequence of instructions, by the looks of it - and /all/ those new instructions are given the old instruction's debug location? What are cases where the replacement is more than one instruction? Is it right to give all the new instructions the same location? I guess it's probably fine, but does feel a bit weird & not entirely consistent with the new policy we've discussed, where only the instruction that produces the final result that matches the original instruction should have the new location, I guess, maybe?

SLTozer · 2024-11-08T23:38:44Z

I guess it's probably fine, but does feel a bit weird & not entirely consistent with the new policy we've discussed, where only the instruction that produces the final result that matches the original instruction should have the new location, I guess, maybe?

There might have been some ambiguity in the explanation; the new example on the policy update I gave covers this a bit more directly, but it could be made more explicit. As I see it, the set of new instructions that, in sequence, produce the final value of the old instruction(s) that we're replacing, should all be assigned the location of the replaced instruction (or the merge of those locations if there were multiple). The reason for this is the same as in cases where we lower an IR instruction into multiple MIR/assembly instructions and apply its DebugLoc to all of them: even if we create multiple instructions, they're all working towards the single output value.

dwblaikie · 2024-11-11T19:19:09Z

I guess it's probably fine, but does feel a bit weird & not entirely consistent with the new policy we've discussed, where only the instruction that produces the final result that matches the original instruction should have the new location, I guess, maybe?

There might have been some ambiguity in the explanation; the new example on the policy update I gave covers this a bit more directly, but it could be made more explicit. As I see it, the set of new instructions that, in sequence, produce the final value of the old instruction(s) that we're replacing, should all be assigned the location of the replaced instruction (or the merge of those locations if there were multiple). The reason for this is the same as in cases where we lower an IR instruction into multiple MIR/assembly instructions and apply its DebugLoc to all of them: even if we create multiple instructions, they're all working towards the single output value.

Fair enough - I can go with that. Sorry if I missed that in the policy wording update.

Following discussions on PR #114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.

Following discussions on PR llvm#114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.

SLTozer requested review from jmorse, adrian-prantl, OCHyams, dwblaikie and CarlosAlbertoEnciso October 30, 2024 13:41

SLTozer self-assigned this Oct 30, 2024

SLTozer requested a review from nikic as a code owner October 30, 2024 13:41

llvmbot added the llvm:transforms label Oct 30, 2024

adrian-prantl approved these changes Oct 30, 2024

View reviewed changes

SLTozer mentioned this pull request Nov 7, 2024

[DebugInfo] Update policy for when to merge locations #115349

Merged

jryans added the debuginfo label Nov 8, 2024

dwblaikie approved these changes Nov 11, 2024

View reviewed changes

SLTozer merged commit d686e5c into llvm:main Nov 14, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231

[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231

SLTozer commented Oct 30, 2024 •

edited

Loading

llvmbot commented Oct 30, 2024 •

edited

Loading

dwblaikie commented Oct 30, 2024

SLTozer commented Oct 30, 2024

dwblaikie commented Oct 30, 2024

SLTozer commented Nov 4, 2024

nikic commented Nov 4, 2024 •

edited

Loading

SLTozer commented Nov 4, 2024 •

edited

Loading

dwblaikie commented Nov 4, 2024

SLTozer commented Nov 4, 2024

dwblaikie commented Nov 5, 2024

adrian-prantl commented Nov 7, 2024

dwblaikie commented Nov 7, 2024

SLTozer commented Nov 7, 2024

SLTozer commented Nov 8, 2024

dwblaikie commented Nov 8, 2024

SLTozer commented Nov 8, 2024

dwblaikie commented Nov 11, 2024

[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231

[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231

Conversation

SLTozer commented Oct 30, 2024 • edited Loading

llvmbot commented Oct 30, 2024 • edited Loading

dwblaikie commented Oct 30, 2024

SLTozer commented Oct 30, 2024

dwblaikie commented Oct 30, 2024

SLTozer commented Nov 4, 2024

nikic commented Nov 4, 2024 • edited Loading

SLTozer commented Nov 4, 2024 • edited Loading

dwblaikie commented Nov 4, 2024

SLTozer commented Nov 4, 2024

dwblaikie commented Nov 5, 2024

adrian-prantl commented Nov 7, 2024

dwblaikie commented Nov 7, 2024

SLTozer commented Nov 7, 2024

SLTozer commented Nov 8, 2024

dwblaikie commented Nov 8, 2024

SLTozer commented Nov 8, 2024

dwblaikie commented Nov 11, 2024

SLTozer commented Oct 30, 2024 •

edited

Loading

llvmbot commented Oct 30, 2024 •

edited

Loading

nikic commented Nov 4, 2024 •

edited

Loading

SLTozer commented Nov 4, 2024 •

edited

Loading