cmd/compile: slow escape analysis in large package in the typescript compiler #72815

Jorropo · 2025-03-12T06:37:07Z

Go version

go version go1.24.1 linux/amd64

Output of `go env` in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v3'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/tmp/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1913709825=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/hugo/k/go/src/go.mod'
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.1'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I've tried compiling the new typescript compiler and it took 70s:

________________________________________________________
Executed in   70.51 secs    fish           external
   usr time  159.17 secs  543.00 micros  159.17 secs
   sys time    5.49 secs  249.00 micros    5.49 secs

What tipped me off to an issue is the poor 160 ÷ 70 ≈ 2.3 multi-core utilization.

The biggest outlier is github.com/microsoft/typescript-go/internal/checker:

github.com/microsoft/typescript-go/internal/checker

________________________________________________________
Executed in   44.97 secs    fish           external
   usr time   50.51 secs  413.00 micros   50.51 secs
   sys time    0.32 secs  142.00 micros    0.32 secs

A CPU profile is very suspicious, almost all of the time is spent here:

I've added a couple of debug statements in theses loops:
There is a suspicious:

walkAll 36466 <nil> <nil>

36466 is the length of the queue.
This steadily slowly goes down, walkOne roughly does ~5000 iterations for each iteration of walkAll

The text was updated successfully, but these errors were encountered:

gabyhelp · 2025-03-12T07:04:02Z

Related Issues

cmd/dist: `GOMAXPROCS=2 runtime -cpu=1,2,4 -quick` makes poor use of CPU resources, increasing `all.bat` latency #65164 (closed)
cmd/compile: compileSSA takes a long time on very large (generated) functions #30077
mime/quotedprintable: Go 2: Why takes so long to compile? (At least 8min+) #39912 (closed)
cmd/compile: long build time (45 seconds) for a small package (1500 source lines of code) #65097 (closed)
Compilator takes unpredictable build times and I have no idea what it's doing and why it's doing #70691 (closed)
cmd/go: cache results of exec.LookPath #36768 (closed)
runtime: `-quick` tests aren't quick with reduced GOMAXPROCS #53818 (closed)
cmd/go: add duration on each -x of go build to know which pkg that slow #37591 (closed)
affected/package: #57137 (closed)
cmd/compile/internal/escape: performance issue #67313 (closed)

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Jorropo · 2025-03-12T07:06:34Z

I've tried to bisect it but 1.22 is the last version I could easily test as this is a modern codebase and it is just as bad.

jakebailey · 2025-03-12T07:12:32Z

I said this on the gopher slack, but I'm not sure entirely if this is a regression in Go or just our package scaling poorly as it grew during the port; I plan to gather some data over all of the commits in the repo to see what that looks like.

Of course, a bug in the compiler would certainly be "good news" from the PoV that we wouldn't have to figure out how to break apart the monster.

Jorropo · 2025-03-12T07:22:30Z

For context this package has 2652 functions, 1669 are *checker.Checker methods.

prattmic · 2025-03-12T15:52:12Z

cc @golang/compiler

dr2chase · 2025-03-12T16:19:15Z

If there's a copy-paste recipe for building the new typescript compiler to show this problem, that would help. I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

jakebailey · 2025-03-12T16:21:58Z

The repo doesn't require any special tooling to build any of the Go code there at all, so it's just:

$ git clone https://github.com/microsoft/typescript-go.git
$ cd typescript-go
$ go build ./internal/checker

Jorropo · 2025-03-12T16:36:50Z

I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

How am I gonna claim the fix if I help you investigating it ?
Jokes aside, mb 😄

prattmic · 2025-03-12T16:41:39Z

FWIW, after internal/checker (60s), the next slowest package is internal/ast, which takes 16s on my machine. That appears to be a different issue. Escape analysis doesn't show up at all. Nothing in particular stands out to me in the profile.

prattmic · 2025-03-12T16:57:20Z

Correct.

internal/checker (60 wall-s): https://pprof.host/jc4g/flamegraph
internal/ast (16 wall-s): https://pprof.host/j84g/flamegraph
runtime (5 wall-s): https://pprof.host/j44g/flamegraph

I include runtime for reference. It seems to be bigger and builds much faster, but the profiles are fairly similar. Most obvious difference is more time in noder.MakeWrappers vs runtime, but that is still much less than SSA.

dr2chase · 2025-03-12T17:20:17Z

I experimentally turned off inlining, and 44-user-second builds turned into 20-user-second builds.

~/work/src/typescript-go$ time go build -gcflags=all=-d=fmahash=1010101010101010101010101 ./internal/checker

real	0m28.298s
user	0m43.952s
sys	0m2.218s
~/work/src/typescript-go$ time go build -gcflags=all=-l\ -d=fmahash=11010101010101010101010101 ./internal/checker

real	0m11.312s
user	0m21.030s
sys	0m1.863s

So, hmmm.

The -d=fmahash parameter is just an irrelevant difference in flags that will guarantee everything gets recompiled.

gopherbot · 2025-03-12T19:54:32Z

Change https://go.dev/cl/657295 mentions this issue: cmd/compile/internal/escape: targeted optimization when analyzing many locations

gopherbot · 2025-03-12T20:08:03Z

Change https://go.dev/cl/657315 mentions this issue: cmd/compile/escape: cache b.outlives(root, l) in walkOne

gopherbot · 2025-03-12T22:59:02Z

Change https://go.dev/cl/657179 mentions this issue: cmd/compile/internal/escape: improve order of work to speed up analyzing many locations

thepudds · 2025-03-13T00:35:36Z

FWIW, I took a quick stab at trying to speed things up in escape analysis for large packages such as typescript-go/internal/checker, and sent two CLs: https://go.dev/cl/657295 and https://go.dev/cl/657179.

The build times reported via the action graph times show a reasonable improvement for typescript-go/internal/checker:

go1.24.0:      91.792s
cl-657179-ps1: 17.578s

with timing via:

# build CL 657179 via gotip, then use it to build typescript-go/internal/checker
$ go install golang.org/dl/gotip@latest
$ gotip download 657179    # download and build CL 657179
$ gotip build -a -debug-actiongraph=/tmp/actiongraph-cl-657179-ps1 -v github.com/microsoft/typescript-go/internal/checker

# see a report on the timing from the action graph
$ go install github.com/icio/actiongraph@latest
$ actiongraph top -f /tmp/actiongraph-cl-657179-ps1

The CLs pass the trybots, but definitely consider these tentative results (including still WIP and I want to look at more results, look at the performance of some other large packages, double-check things, and in general step back & think a bit more about correctness, etc.). Depending how you count, it's effectively ~3-4 changes between the two CLs, and I haven't teased apart if one or more of those changes might not be useful.

That said, I have some cautious hope things can be sped up for the escape analysis piece, perhaps via something like these CLs, or perhaps via something else.

gopherbot · 2025-03-13T02:44:32Z

Change https://go.dev/cl/657077 mentions this issue: sweet: add typescript-go to go-build benchmark

jakebailey · 2025-03-13T02:58:43Z

Amazing job on those speedups! 5x would be so good.

I let my machine go through the git history and collect data on how long go build -a ./... takes to run over time. Forgive the unreadable text, but:

So, it does seem somewhat organic. (Not that "organic" growth doesn't imply there aren't things to improve, obviously.)

However, that big cliff about near the centerish comes from microsoft/typescript-go@bcce040.

bcce0409a7d7098e52b437910b1eb8d41e0832b5~1: https://pprof.host/jg4g/
bcce0409a7d7098e52b437910b1eb8d41e0832b5: https://pprof.host/jm4g/

Which shows that this one commit increased escape analysis time in the checker package by nearly 20 seconds. That seems unusally large for the change made in that commit.

Building this repository revealed some inefficiencies in the compiler. This change adds the main command from this repository (tsgo) as a benchmark of `go build -a` (a cold build) so we can track improvements and hopefully catch any future regressions. For golang/go#72815. Change-Id: I8e01850b7956970000211cce50f200c3e38e54af Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/657077 Reviewed-by: Carlos Amedee <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>

dr2chase · 2025-03-13T21:15:01Z

I was looking into whether it would make sense to run escape analysis in parallel, and in the process found that there is a strongly connected component of 1305 functions in the call graph. I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping, so with some work we could break that up into 1305 individual functions, which might also save time and would also reduce the wall time, if not the user time.

Here's the beginning of the list:

github.com/microsoft/typescript-go/internal/checker.(*Checker).resolveEntityName,
(*Checker).resolveQualifiedName,
(*Checker).getSymbol,
(*Checker).getSymbolFlagsEx,
(*Checker).getTypeOnlyAliasDeclarationEx,
(*Checker).resolveSymbolEx,
(*Checker).resolveAlias,
(*Checker).getTargetOfAliasDeclaration,
(*Checker).getTargetOfImportEqualsDeclaration,
(*Checker).resolveExternalModuleTypeByLiteral,
and 1295 more

Jorropo · 2025-03-13T21:39:11Z

If my english is not failing me I think the following sentence is wrong.

I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping

package a

type node struct {
	next *node
}

func stackAllocatedLinkedList(prev *node, budget uint) {
	if budget == 0 {
		return
	}
	someOtherFunction(&node{prev}, budget-1)
}

//go:noinline
func someOtherFunction(prev *node, budget uint) {
	stackAllocatedLinkedList(&node{prev}, budget-1)
}

See how it creates a linked list across the stack frames:

    00010 (+11) MOVQ AX, command-line-arguments..autotmp_3-8(SP)
    00011 (11) DECQ BX
    00012 (11) LEAQ command-line-arguments..autotmp_3-8(SP), AX
    00013 (11) PCDATA $1, $1
    # live at call to someOtherFunction:
    00014 (11) CALL command-line-arguments.someOtherFunction(SB)

I don't know how much real world code this optimization helps.

mcy · 2025-03-14T04:54:18Z

I don't know how much real world code this optimization helps.

A lot of high-throughput compiler-flavored code depends on this. The example here is contrived. A better example would be a recursive function that uses some kind of cursor type to walk levels of a tree (e.g. imagine walking a btree). If each step needs to create a new cursor and pass it to the callee, each cursor will now wind up on the heap. This will also trigger if the callee needs to be passed e.g. a mere int out parameter.

Any graph walking algorithm that was previously allocation-free will now allocate in every frame, generating a lot of surprise garbage.

I wouldn't be surprised if this made the go compiler itself slower on average, due to missed optimizations when compiling itself...

dr2chase · 2025-03-14T16:15:48Z

@Jorropo thanks very much for checking that. 10 years ago, escape analysis didn't do this, @mdempsky's rewrite made it better. There may still be some domain-specific hacks to make this faster, though.

thepudds · 2025-03-18T19:52:45Z

FWIW, I suspect/hope we can run most of the work of escape analysis in parallel.

I now have a WIP parallel version of escape analysis where it runs the inner loop concurrently. (It does not partition the data-flow graph, which might be tricky to do without affecting the quality of the results. It instead sets things up to be able to run multiple walkOne in parallel over the original data-flow graph and merges the results).

Here are some tentative numbers for the typescript compiler's checker package with the WIP parallel version:

                 escape analysis    total package compile time

go1.24.1           47.109 sec         50.60 sec
cl-657179-ps3       8.095 sec         11.57 sec
parallel-wip        2.844 sec          6.28 sec

The first row is Go 1.24, the second row is the two earlier CLs I sent, and the third row is giving 8 logical cores to the WIP parallel version. (This is a different test VM than my numbers above). Running it with more cores would probably make it a bit faster, but there are likely diminishing returns.

The timing here is via -gcflags=-bench=<file>.txt, such as:

$ go build -a -v -gcflags='-bench=parallel-8-core.bench' github.com/microsoft/typescript-go/internal/checker

There are definitely some caveats, especially for the parallel version. It passes my local tests, but I have not tried the trybots. Also, I took a couple of shortcuts, including I temporarily commented out the code for the detailed debug logging of path-based diagnostic reports (available via -gcflags=-m=2), but I suspect I can resolve those, and the more basic debug logging (via -gcflags=-m=1) does seem to still work.

Also, note the times here are for compiling the typescript compiler's very large checker package and not an entire "from scratch" build of the typescript compiler. (The checker package was the long pole overall and it is the main subject of this issue here).

At this point, I plan to try to get the parallel work and the other CLs into reviewable shape, and we can see whether any of this is worthwhile vs. maybe there's a better direction, or maybe this is all flawed. 😅

jakebailey · 2025-03-18T20:11:48Z

Getting in even what's already sent would be amazing; I've so far given people instructions on how to gotip download your CLs to make our dev experience better and it's been working out pretty well.

I was going to say "it's a shame we didn't announce the port earlier so we could report these", but I don't think this was a problem back before the freeze so it wouldn't have mattered 😅

mdempsky · 2025-03-20T23:03:09Z

Thanks for the test case.

jakebailey · 2025-04-05T11:24:30Z

@thepudds It's been a bit, but is there a chance your existing CLs can be unmarked as WIP so they could make into tip? I don't think you have sent the concurrent change yet, but having what's sent already would be pretty helpful.

Some of us are using your CL for local dev, but since it's out of date relative to tip, it's missing DWARF5 fixes (due to when it was sent) and other stuff that make it less fun to use outside plain go build/test.

thepudds · 2025-04-07T05:41:10Z

Hi @jakebailey, I rebased the first two CLs on master. The first CL will probably pop out of review soon. I might simplify the second CL a bit more, but hopefully that won't be too long either. But now if you do gotip download 657179 again, you should get the recent commits on tip plus the escape analyis changes.

…of walkOne Broadly speaking, escape analysis has two main phases. First, it traverses the AST while building a data-flow graph of locations and edges. Second, during "solve", it repeatedly walks the data-flow graph while carefully propagating information about each location, including whether a location's address reaches the heap. Once escape analysis is in the solve phase and repeatedly walking the data-flow graph, almost all the information it needs is within the location graph, with a notable exception being the ir.Class of an ir.Name, which currently must be checked by following a pointer from the location to its ir.Node. For typical graphs, that does not matter much, but if the graph becomes large enough, cache misses in the inner solve loop start to matter more, and the class is checked many times in the inner loop. We therefore store the class information on the location in the graph to reduce how much memory we need to load in the inner loop. The package github.com/microsoft/typescript-go/internal/checker has many locations, and compilation currently spends most of its time in escape analysis. This CL gives roughly a 30% speedup for wall clock compilation time for the checker package: go1.24.0: 91.79s this CL: 64.98s Linux perf shows a healthy reduction for example in l2_request.miss and dTLB-load-misses on an amd64 test VM. We could tweak things a bit more, though initial review feedback has suggested it would be good to get this in as it stands. Subsequent CLs in this stack give larger improvements. Updates #72815 Change-Id: I3117430dff684c99e6da1e0d7763869873379238 Reviewed-on: https://go-review.googlesource.com/c/go/+/657295 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Jake Bailey <[email protected]> Reviewed-by: David Chase <[email protected]>

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 12, 2025

gabyhelp added the Other None of the above. label Mar 12, 2025

prattmic added the Performance label Mar 12, 2025

prattmic added this to the Backlog milestone Mar 12, 2025

prattmic changed the title ~~cmd/compile: compiling the typescript compiler is slow~~ cmd/compile: slow escape analysis in large package in the typescript compiler Mar 12, 2025

Jorropo added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 12, 2025

This comment has been minimized.

Sign in to view

mknyszek added this to Go Compiler / Runtime Mar 12, 2025

mknyszek assigned dr2chase Mar 12, 2025

mknyszek moved this to Todo in Go Compiler / Runtime Mar 12, 2025

thepudds mentioned this issue Mar 25, 2025

cmd/compile: consider speeding up compile of large ast package in microsoft/typescript-go #73044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

Jorropo commented Mar 12, 2025 •

edited

Loading

gabyhelp commented Mar 12, 2025

Jorropo commented Mar 12, 2025

jakebailey commented Mar 12, 2025 •

edited

Loading

Jorropo commented Mar 12, 2025

prattmic commented Mar 12, 2025

dr2chase commented Mar 12, 2025

jakebailey commented Mar 12, 2025

Jorropo commented Mar 12, 2025

prattmic commented Mar 12, 2025

This comment has been minimized.

prattmic commented Mar 12, 2025

dr2chase commented Mar 12, 2025 •

edited

Loading

gopherbot commented Mar 12, 2025

gopherbot commented Mar 12, 2025

gopherbot commented Mar 12, 2025

thepudds commented Mar 13, 2025 •

edited

Loading

gopherbot commented Mar 13, 2025

jakebailey commented Mar 13, 2025 •

edited

Loading

dr2chase commented Mar 13, 2025 •

edited

Loading

Jorropo commented Mar 13, 2025 •

edited

Loading

mcy commented Mar 14, 2025

dr2chase commented Mar 14, 2025

thepudds commented Mar 18, 2025

jakebailey commented Mar 18, 2025

mdempsky commented Mar 20, 2025

jakebailey commented Apr 5, 2025

thepudds commented Apr 7, 2025

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

Comments

Jorropo commented Mar 12, 2025 • edited Loading

Go version

Output of go env in your module/workspace:

What did you do?

gabyhelp commented Mar 12, 2025

Jorropo commented Mar 12, 2025

jakebailey commented Mar 12, 2025 • edited Loading

Jorropo commented Mar 12, 2025

prattmic commented Mar 12, 2025

dr2chase commented Mar 12, 2025

jakebailey commented Mar 12, 2025

Jorropo commented Mar 12, 2025

prattmic commented Mar 12, 2025

This comment has been minimized.

prattmic commented Mar 12, 2025

dr2chase commented Mar 12, 2025 • edited Loading

gopherbot commented Mar 12, 2025

gopherbot commented Mar 12, 2025

gopherbot commented Mar 12, 2025

thepudds commented Mar 13, 2025 • edited Loading

gopherbot commented Mar 13, 2025

jakebailey commented Mar 13, 2025 • edited Loading

dr2chase commented Mar 13, 2025 • edited Loading

Jorropo commented Mar 13, 2025 • edited Loading

mcy commented Mar 14, 2025

dr2chase commented Mar 14, 2025

thepudds commented Mar 18, 2025

jakebailey commented Mar 18, 2025

mdempsky commented Mar 20, 2025

jakebailey commented Apr 5, 2025

thepudds commented Apr 7, 2025

Jorropo commented Mar 12, 2025 •

edited

Loading

Output of `go env` in your module/workspace:

jakebailey commented Mar 12, 2025 •

edited

Loading

dr2chase commented Mar 12, 2025 •

edited

Loading

thepudds commented Mar 13, 2025 •

edited

Loading

jakebailey commented Mar 13, 2025 •

edited

Loading

dr2chase commented Mar 13, 2025 •

edited

Loading

Jorropo commented Mar 13, 2025 •

edited

Loading