-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: slow escape analysis in large package in the typescript compiler #72815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've tried to bisect it but 1.22 is the last version I could easily test as this is a modern codebase and it is just as bad. |
I said this on the gopher slack, but I'm not sure entirely if this is a regression in Go or just our package scaling poorly as it grew during the port; I plan to gather some data over all of the commits in the repo to see what that looks like. Of course, a bug in the compiler would certainly be "good news" from the PoV that we wouldn't have to figure out how to break apart the monster. |
For context this package has 2652 functions, 1669 are |
cc @golang/compiler |
If there's a copy-paste recipe for building the new typescript compiler to show this problem, that would help. I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem. |
The repo doesn't require any special tooling to build any of the Go code there at all, so it's just: $ git clone https://github.com/microsoft/typescript-go.git
$ cd typescript-go
$ go build ./internal/checker |
How am I gonna claim the fix if I help you investigating it ? |
FWIW, after |
This comment has been minimized.
This comment has been minimized.
Correct.
I include runtime for reference. It seems to be bigger and builds much faster, but the profiles are fairly similar. Most obvious difference is more time in |
I experimentally turned off inlining, and 44-user-second builds turned into 20-user-second builds.
So, hmmm. The -d=fmahash parameter is just an irrelevant difference in flags that will guarantee everything gets recompiled. |
Change https://go.dev/cl/657295 mentions this issue: |
Change https://go.dev/cl/657315 mentions this issue: |
Change https://go.dev/cl/657179 mentions this issue: |
FWIW, I took a quick stab at trying to speed things up in escape analysis for large packages such as typescript-go/internal/checker, and sent two CLs: https://go.dev/cl/657295 and https://go.dev/cl/657179. The build times reported via the action graph times show a reasonable improvement for typescript-go/internal/checker:
with timing via:
The CLs pass the trybots, but definitely consider these tentative results (including still WIP and I want to look at more results, look at the performance of some other large packages, double-check things, and in general step back & think a bit more about correctness, etc.). Depending how you count, it's effectively ~3-4 changes between the two CLs, and I haven't teased apart if one or more of those changes might not be useful. That said, I have some cautious hope things can be sped up for the escape analysis piece, perhaps via something like these CLs, or perhaps via something else. |
Change https://go.dev/cl/657077 mentions this issue: |
Amazing job on those speedups! 5x would be so good. I let my machine go through the git history and collect data on how long So, it does seem somewhat organic. (Not that "organic" growth doesn't imply there aren't things to improve, obviously.) However, that big cliff about near the centerish comes from microsoft/typescript-go@bcce040.
Which shows that this one commit increased escape analysis time in the checker package by nearly 20 seconds. That seems unusally large for the change made in that commit. |
Building this repository revealed some inefficiencies in the compiler. This change adds the main command from this repository (tsgo) as a benchmark of `go build -a` (a cold build) so we can track improvements and hopefully catch any future regressions. For golang/go#72815. Change-Id: I8e01850b7956970000211cce50f200c3e38e54af Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/657077 Reviewed-by: Carlos Amedee <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
I was looking into whether it would make sense to run escape analysis in parallel, and in the process found that there is a strongly connected component of 1305 functions in the call graph. I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping, so with some work we could break that up into 1305 individual functions, which might also save time and would also reduce the wall time, if not the user time. Here's the beginning of the list:
|
If my english is not failing me I think the following sentence is wrong.
package a
type node struct {
next *node
}
func stackAllocatedLinkedList(prev *node, budget uint) {
if budget == 0 {
return
}
someOtherFunction(&node{prev}, budget-1)
}
//go:noinline
func someOtherFunction(prev *node, budget uint) {
stackAllocatedLinkedList(&node{prev}, budget-1)
} See how it creates a linked list across the stack frames:
I don't know how much real world code this optimization helps. |
A lot of high-throughput compiler-flavored code depends on this. The example here is contrived. A better example would be a recursive function that uses some kind of cursor type to walk levels of a tree (e.g. imagine walking a btree). If each step needs to create a new cursor and pass it to the callee, each cursor will now wind up on the heap. This will also trigger if the callee needs to be passed e.g. a mere int out parameter. Any graph walking algorithm that was previously allocation-free will now allocate in every frame, generating a lot of surprise garbage. I wouldn't be surprised if this made the go compiler itself slower on average, due to missed optimizations when compiling itself... |
FWIW, I suspect/hope we can run most of the work of escape analysis in parallel. I now have a WIP parallel version of escape analysis where it runs the inner loop concurrently. (It does not partition the data-flow graph, which might be tricky to do without affecting the quality of the results. It instead sets things up to be able to run multiple walkOne in parallel over the original data-flow graph and merges the results). Here are some tentative numbers for the typescript compiler's
The first row is Go 1.24, the second row is the two earlier CLs I sent, and the third row is giving 8 logical cores to the WIP parallel version. (This is a different test VM than my numbers above). Running it with more cores would probably make it a bit faster, but there are likely diminishing returns. The timing here is via
There are definitely some caveats, especially for the parallel version. It passes my local tests, but I have not tried the trybots. Also, I took a couple of shortcuts, including I temporarily commented out the code for the detailed debug logging of path-based diagnostic reports (available via Also, note the times here are for compiling the typescript compiler's very large At this point, I plan to try to get the parallel work and the other CLs into reviewable shape, and we can see whether any of this is worthwhile vs. maybe there's a better direction, or maybe this is all flawed. 😅 |
Getting in even what's already sent would be amazing; I've so far given people instructions on how to I was going to say "it's a shame we didn't announce the port earlier so we could report these", but I don't think this was a problem back before the freeze so it wouldn't have mattered 😅 |
Thanks for the test case. |
@thepudds It's been a bit, but is there a chance your existing CLs can be unmarked as WIP so they could make into tip? I don't think you have sent the concurrent change yet, but having what's sent already would be pretty helpful. Some of us are using your CL for local dev, but since it's out of date relative to tip, it's missing DWARF5 fixes (due to when it was sent) and other stuff that make it less fun to use outside plain go build/test. |
Hi @jakebailey, I rebased the first two CLs on master. The first CL will probably pop out of review soon. I might simplify the second CL a bit more, but hopefully that won't be too long either. But now if you do |
…of walkOne Broadly speaking, escape analysis has two main phases. First, it traverses the AST while building a data-flow graph of locations and edges. Second, during "solve", it repeatedly walks the data-flow graph while carefully propagating information about each location, including whether a location's address reaches the heap. Once escape analysis is in the solve phase and repeatedly walking the data-flow graph, almost all the information it needs is within the location graph, with a notable exception being the ir.Class of an ir.Name, which currently must be checked by following a pointer from the location to its ir.Node. For typical graphs, that does not matter much, but if the graph becomes large enough, cache misses in the inner solve loop start to matter more, and the class is checked many times in the inner loop. We therefore store the class information on the location in the graph to reduce how much memory we need to load in the inner loop. The package github.com/microsoft/typescript-go/internal/checker has many locations, and compilation currently spends most of its time in escape analysis. This CL gives roughly a 30% speedup for wall clock compilation time for the checker package: go1.24.0: 91.79s this CL: 64.98s Linux perf shows a healthy reduction for example in l2_request.miss and dTLB-load-misses on an amd64 test VM. We could tweak things a bit more, though initial review feedback has suggested it would be good to get this in as it stands. Subsequent CLs in this stack give larger improvements. Updates #72815 Change-Id: I3117430dff684c99e6da1e0d7763869873379238 Reviewed-on: https://go-review.googlesource.com/c/go/+/657295 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Jake Bailey <[email protected]> Reviewed-by: David Chase <[email protected]>
Go version
go version go1.24.1 linux/amd64
Output of
go env
in your module/workspace:What did you do?
I've tried compiling the new typescript compiler and it took 70s:
What tipped me off to an issue is the poor
160 ÷ 70 ≈ 2.3
multi-core utilization.The biggest outlier is
github.com/microsoft/typescript-go/internal/checker
:A CPU profile is very suspicious, almost all of the time is spent here:

I've added a couple of debug statements in theses loops:
There is a suspicious:
36466 is the length of the queue.
This steadily slowly goes down,
walkOne
roughly does ~5000 iterations for each iteration ofwalkAll
The text was updated successfully, but these errors were encountered: