cmd/compile: slowdown in location list generation, possible remedies #52975
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsDecision
Feedback is required from experts, contributors, and/or the community before a change can be made.
ToolSpeed
Milestone
Uh oh!
There was an error while loading. Please reload this page.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputCL 397318 fixed a quadratic space consumption problem in location list generation (#51543) with a fancy data structure that shares storage for small changes to sets (and produced a 95% reduction in heap size for the problem case). Unfortunately, it's slower, overall adding about 2% to build user time, but for the worst case, 35%.
So that's "the bug", here's discussion of causes and possible remedies.
The root cause is that the location list generation algorithm is currently performing a quadratic amount of work. For each block, set operations linear in the number of live slots (intersection, difference) are performed. In some cases the number of live slots is linear in program size, the number of blocks is linear in program size, and we get quadratic time. (A "slot" is a variable or a piece of an aggregate-typed variable).
This is not necessarily required; clever preprocessing might allow us to notice that a block B's flow predecessors P and Q were both descendants of a common block R, therefore their intersection might be computed more efficiently by only considering flow from R to P and R to Q, which might be smaller. This is handwavy, potentially complicated, and likely also involves operations with a noticeable constant factor, so pursuing this route would take a little work, and might not pay off.
A more certain plan for improving performance, though not the asymptotic cost, is to reduce the number of conversions between set representations. The clever structures are currently used to record long-lived set data, the slots live at entrance and exit from each block. The operations with a block are applied to a simple set representation, which is created and consumed each time a block's effects on live slots are modeled. This is slightly trickier than just "skip the data conversions and use the shared sets everywhere" because the live data comes in two parts, one mapping slots to where they are found, and the other mapping registers to the slots that are currently bound to that register (this is necessary to know what associations are undone by assignment to a register).
The text was updated successfully, but these errors were encountered: