-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[DebugInfo][InstCombine] When replacing bswap idiom, add DebugLoc to new insts #114231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…new insts Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution.
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-llvm-transforms Author: Stephen Tozer (SLTozer) ChangesCurrently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution. Full diff: https://github.com/llvm/llvm-project/pull/114231.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 50d1c61c24cf47..e2eae7fb8327cc 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2879,8 +2879,10 @@ Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I,
Instruction *LastInst = Insts.pop_back_val();
LastInst->removeFromParent();
- for (auto *Inst : Insts)
+ for (auto *Inst : Insts) {
+ Inst->setDebugLoc(I.getDebugLoc());
Worklist.push(Inst);
+ }
return LastInst;
}
diff --git a/llvm/test/Transforms/InstCombine/debugloc-bswap.ll b/llvm/test/Transforms/InstCombine/debugloc-bswap.ll
new file mode 100644
index 00000000000000..6ac82878f9c201
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/debugloc-bswap.ll
@@ -0,0 +1,52 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=instcombine -S %s -o - | FileCheck %s
+;; Tests that when InstCombine replaces a bswap idiom with an intrinsic, the
+;; !dbg attachment from the replaced instruction is applied to all generated
+;; instructions, not just the last.
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @inflate(ptr %strm) {
+; CHECK-LABEL: define i32 @inflate(
+; CHECK-SAME: ptr [[STRM:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[STRM]], align 8
+; CHECK-NEXT: [[TRUNC:%.*]] = trunc i64 [[TMP0]] to i32, !dbg [[DBG3:![0-9]+]]
+; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TRUNC]], 65535, !dbg [[DBG3]]
+; CHECK-NEXT: [[MASK:%.*]] = call i32 @llvm.bswap.i32(i32 [[TMP1]]), !dbg [[DBG3]]
+; CHECK-NEXT: [[ADD665:%.*]] = zext i32 [[MASK]] to i64, !dbg [[DBG3]]
+; CHECK-NEXT: store i64 [[ADD665]], ptr [[STRM]], align 8
+; CHECK-NEXT: ret i32 0
+;
+entry:
+ %0 = load i64, ptr %strm, align 8
+ %and660 = and i64 %0, 65280
+ %shl661 = shl i64 %and660, 8
+ %and663 = and i64 %0, 255
+ %shl664 = shl i64 %and663, 24
+ %add665 = or i64 %shl661, %shl664, !dbg !4
+ store i64 %add665, ptr %strm, align 8
+ ret i32 0
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 20.0.0git")
+!1 = !DIFile(filename: "/tmp/zlib_inflate.c", directory: "/tmp")
+!2 = !{}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !DILocation(line: 8, column: 42, scope: !6)
+!5 = !DIFile(filename: "zlib_inflate.c", directory: "/tmp")
+!6 = distinct !DISubprogram(name: "inflate", scope: !5, file: !5, line: 622, type: !7, scopeLine: 625, unit: !0, retainedNodes: !2)
+!7 = distinct !DISubroutineType(types: !2)
+;.
+; CHECK: [[META0:![0-9]+]] = distinct !DICompileUnit(language: DW_LANG_C11, file: [[META1:![0-9]+]], producer: "{{.*}}clang version {{.*}}", isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug)
+; CHECK: [[META1]] = !DIFile(filename: "/tmp/zlib_inflate.c", directory: {{.*}})
+; CHECK: [[DBG3]] = !DILocation(line: 8, column: 42, scope: [[META4:![0-9]+]])
+; CHECK: [[META4]] = distinct !DISubprogram(name: "inflate", scope: [[META5:![0-9]+]], file: [[META5]], line: 622, type: [[META6:![0-9]+]], scopeLine: 625, spFlags: DISPFlagDefinition, unit: [[META0]], retainedNodes: [[META7:![0-9]+]])
+; CHECK: [[META5]] = !DIFile(filename: "zlib_inflate.c", directory: {{.*}})
+; CHECK: [[META6]] = distinct !DISubroutineType(types: [[META7]])
+; CHECK: [[META7]] = !{}
+;.
|
Should/does this use a merged location? (the test and patch description mention several instructions being replaced by several other instructions - so sounds like the new instructions should get the merged location of all the old instructions? But not clear if that's what's happening, or just the last of the old instructions is used as the location for the new ones?) |
That's a good question, and one that applies to some other patches I've worked/am working on. I don't think we have a well-defined approach to handling locations in optimizations that rewrite multiple instructions into a set of new instructions that compute the same end result. Merging all the locations and applying the merged location to each new instruction is one option; it feels like a truthful representation, but is also slightly disappointing in that in many cases it means setting a line 0 location. I think there's a reasonable argument for using the location of the final instruction(s) for all new generated instructions, because all the new instructions exist to compute the value previously computed by that final instruction(s), and they otherwise have no exact relationship to any single instruction in the original program. This is what I lean more towards right now, as I think it's as valid in principle as the merged location and more pragmatically useful, but that's not a strong position - very much interested in hearing other opinions on this, and possibly encoding them for posterity in the "How To Update Debug Info" doc. |
Agreed it's disappointing - but I'm not sure how to argue against it - if we considered the case where multiple instructions were replaced with one instruction, I think our documented process is pretty clear that we should merge locations there, right? So why would it be so different if the result is multiple instructions? We could add some special case to merging - if all the instructions to be merged are in the same basic block, perhaps we could say in that case we pick the last instruction if the merge would be worse in some way? (I guess at that point the rule is just "pick the last") - the loss there is we lose other source locations to break on. (though I guess that's true if we drop the location too) |
Hm, you're right that this is the documented rule - I went into this thinking this case was more ambiguous, because in InstCombine that rule isn't always strictly followed: we perform peephole optimizations that may replace multiple instructions with a single instruction that uses the location of the final replaced instruction. For this patch I'm happy to switch back to use merged locations to keep it consistent with the existing rules, but I think the rule should be different: my short proposal would be that when replacing a sequence of instructions, where there are no side-effects and only one instruction's output is being replaced by the new instruction(s), we should use the output instruction's location for the instruction(s) inserted to replace it. That's a discussion that should happen elsewhere (either an RFC or another review), but I think it's probably worth having. In the meanwhile, I'll update this patch! |
FWIW, I think that trying to use a merged location rather than the location of the last instruction in InstCombine would be completely impractical. Taking the last instruction can be done automatically, while a merged location would require each individual fold to track which instructions "contribute" towards it (a notion that is not going to be particularly well-defined for many folds anyway, esp. in multi-use scenarios where some of the old instructions are also retained). |
I mostly agree, notably with this patch we will use merged locations for all the new instructions generated except for the final instruction that we RAUW the original instruction with, which will take the original instruction's location as always - so this patch will only apply to the new instructions generated in the course of creating the replacement instruction. But as we move towards avoiding all unnecessary drops of DILocations, cases like this where specific folds generate new instructions will need to manually set DILocations on those new instructions if we don't have an automatic way of doing so. |
In terms of a policy change - I could see an argument for "we didn't merge these instructions, we replaced the last instruction with some other instruction and optimized the rest of the instructions away" - though how do we differentiate these cases, does this apply to every "merged" instruction? Oh, I guess there's a real difference here - merged instructions that come from things like hoisting and lowering (we took two equivalent instructions in different branches and effectively deduplicated them at some common point before or after) versus we took a series of instructions and collapsed them into one. Perhaps we could say in the latter case, we always use the location of the final instruction in the original sequence? (do we define fallbacks? if that final instruction doesn't have a location we should use one of the others? If they're in the same basic block? (though sounds like you folks would be pushing back on implementing such a fallback here as tracking the set of locations that flow into this transformation is too expensive?)) |
Right, I think a slightly more detailed form of the policy I'd suggest would be that every original instruction whose outputs/side-effects are produced by the new instructions is interesting, and the location for the new instructions should be the merge of all interesting instructions. So for example,
I'm not sure about using other instructions as fallbacks if the sole interesting instruction has a missing location; it seems like it might help salvage some information where we would otherwise have none, I'd just want to be sure there aren't cases where it would be misleading. |
Sounds good to me. (re: merging two adds - for a sec I was confused, I thought you were describing (a + (b + c)) - in which case it sounded more like the peephole description, but you meant merging ((a1 + b1), (a2 + b2)) into ((a1, a2) + (b1, b2)) ? Yeah, that looks like the function call merging, the instruction is producing the results of both the original instructions) |
@SLTozer Can you propose a patch to https://llvm.org/docs/HowToUpdateDebugInfo.html#when-to-merge-instruction-locations? |
Following discussions on PR llvm#114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.
Perhaps the simplest way to think of it is that replacing, say "add(a, add(b, c))" with "add(a, b, c)" isn't /replacing/ the second add, so it's not merging two adds, it's only replacing the outer add - which has the effect of adding a, b, and c (initially by relying on the result of the add(b, c), but then without relying on that anymore - the add(b, c) might still continue to exist, be used by other things, etc. So it might be more consistent with our existing guidance than it first apperas - still good to write it up, but that might be a way to describe it that makes the unifying principles simpler/clearer as to why this conclusion applies. |
That's how I'm thinking about it, I just think the term "replace" is a bit tricky since by most appearances, the intermediate instructions are being replaced by the folded instruction. Also, the existing policy explicitly states that mul+add=>fma should use a merged location, so changing that at least is a change in policy. |
The policy change patch #115349 hasn't landed yet, but the principle seems to be approved of - in which case, is this patch approved to land? |
My main concern is addressed. Looking at the code though, I'm a little confused - one instruction, |
There might have been some ambiguity in the explanation; the new example on the policy update I gave covers this a bit more directly, but it could be made more explicit. As I see it, the set of new instructions that, in sequence, produce the final value of the old instruction(s) that we're replacing, should all be assigned the location of the replaced instruction (or the merge of those locations if there were multiple). The reason for this is the same as in cases where we lower an IR instruction into multiple MIR/assembly instructions and apply its DebugLoc to all of them: even if we create multiple instructions, they're all working towards the single output value. |
Fair enough - I can go with that. Sorry if I missed that in the policy wording update. |
Following discussions on PR #114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.
Following discussions on PR llvm#114231 this patch changes the policy on merging locations, making the rule that new instructions should use a merge of the locations of all the instructions whose output is produced by the new instructions; in the case where only one instruction's output is produced, as in most InstCombine optimizations, we use only that instruction's location.
Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces it with an intrinsic and other instructions, only the last instruction gets the DebugLoc of the replaced instruction set to it. This patch applies the DebugLoc to all the generated instructions, to maintain some degree of attribution.
Found using #107279.