Skip to content

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jorropo opened this issue Mar 12, 2025 · 27 comments
Open
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Other None of the above. Performance
Milestone

Comments

@Jorropo
Copy link
Member

Jorropo commented Mar 12, 2025

Go version

go version go1.24.1 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v3'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/tmp/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1913709825=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/hugo/k/go/src/go.mod'
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.1'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I've tried compiling the new typescript compiler and it took 70s:

________________________________________________________
Executed in   70.51 secs    fish           external
   usr time  159.17 secs  543.00 micros  159.17 secs
   sys time    5.49 secs  249.00 micros    5.49 secs

What tipped me off to an issue is the poor 160 ÷ 70 ≈ 2.3 multi-core utilization.

The biggest outlier is github.com/microsoft/typescript-go/internal/checker:

github.com/microsoft/typescript-go/internal/checker

________________________________________________________
Executed in   44.97 secs    fish           external
   usr time   50.51 secs  413.00 micros   50.51 secs
   sys time    0.32 secs  142.00 micros    0.32 secs

A CPU profile is very suspicious, almost all of the time is spent here:
Image

I've added a couple of debug statements in theses loops:
There is a suspicious:

walkAll 36466 <nil> <nil>

36466 is the length of the queue.
This steadily slowly goes down, walkOne roughly does ~5000 iterations for each iteration of walkAll

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 12, 2025
@gabyhelp gabyhelp added the Other None of the above. label Mar 12, 2025
@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

I've tried to bisect it but 1.22 is the last version I could easily test as this is a modern codebase and it is just as bad.

@jakebailey
Copy link
Contributor

jakebailey commented Mar 12, 2025

I said this on the gopher slack, but I'm not sure entirely if this is a regression in Go or just our package scaling poorly as it grew during the port; I plan to gather some data over all of the commits in the repo to see what that looks like.

Of course, a bug in the compiler would certainly be "good news" from the PoV that we wouldn't have to figure out how to break apart the monster.

@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

For context this package has 2652 functions, 1669 are *checker.Checker methods.

@prattmic prattmic added this to the Backlog milestone Mar 12, 2025
@prattmic prattmic changed the title cmd/compile: compiling the typescript compiler is slow cmd/compile: slow escape analysis in large package in the typescript compiler Mar 12, 2025
@prattmic
Copy link
Member

cc @golang/compiler

@Jorropo Jorropo added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 12, 2025
@dr2chase
Copy link
Contributor

If there's a copy-paste recipe for building the new typescript compiler to show this problem, that would help. I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

@jakebailey
Copy link
Contributor

The repo doesn't require any special tooling to build any of the Go code there at all, so it's just:

$ git clone https://github.com/microsoft/typescript-go.git
$ cd typescript-go
$ go build ./internal/checker

@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

How am I gonna claim the fix if I help you investigating it ?
Jokes aside, mb 😄

@prattmic
Copy link
Member

FWIW, after internal/checker (60s), the next slowest package is internal/ast, which takes 16s on my machine. That appears to be a different issue. Escape analysis doesn't show up at all. Nothing in particular stands out to me in the profile.

@Jorropo

This comment has been minimized.

@prattmic
Copy link
Member

Correct.

internal/checker (60 wall-s): https://pprof.host/jc4g/flamegraph
internal/ast (16 wall-s): https://pprof.host/j84g/flamegraph
runtime (5 wall-s): https://pprof.host/j44g/flamegraph

I include runtime for reference. It seems to be bigger and builds much faster, but the profiles are fairly similar. Most obvious difference is more time in noder.MakeWrappers vs runtime, but that is still much less than SSA.

@dr2chase
Copy link
Contributor

dr2chase commented Mar 12, 2025

I experimentally turned off inlining, and 44-user-second builds turned into 20-user-second builds.

~/work/src/typescript-go$ time go build -gcflags=all=-d=fmahash=1010101010101010101010101 ./internal/checker

real	0m28.298s
user	0m43.952s
sys	0m2.218s
~/work/src/typescript-go$ time go build -gcflags=all=-l\ -d=fmahash=11010101010101010101010101 ./internal/checker

real	0m11.312s
user	0m21.030s
sys	0m1.863s

So, hmmm.

The -d=fmahash parameter is just an irrelevant difference in flags that will guarantee everything gets recompiled.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657295 mentions this issue: cmd/compile/internal/escape: targeted optimization when analyzing many locations

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657315 mentions this issue: cmd/compile/escape: cache b.outlives(root, l) in walkOne

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657179 mentions this issue: cmd/compile/internal/escape: improve order of work to speed up analyzing many locations

@thepudds
Copy link
Contributor

thepudds commented Mar 13, 2025

FWIW, I took a quick stab at trying to speed things up in escape analysis for large packages such as typescript-go/internal/checker, and sent two CLs: https://go.dev/cl/657295 and https://go.dev/cl/657179.

The build times reported via the action graph times show a reasonable improvement for typescript-go/internal/checker:

go1.24.0:      91.792s
cl-657179-ps1: 17.578s

with timing via:

# build CL 657179 via gotip, then use it to build typescript-go/internal/checker
$ go install golang.org/dl/gotip@latest
$ gotip download 657179    # download and build CL 657179
$ gotip build -a -debug-actiongraph=/tmp/actiongraph-cl-657179-ps1 -v github.com/microsoft/typescript-go/internal/checker

# see a report on the timing from the action graph
$ go install github.com/icio/actiongraph@latest
$ actiongraph top -f /tmp/actiongraph-cl-657179-ps1

The CLs pass the trybots, but definitely consider these tentative results (including still WIP and I want to look at more results, look at the performance of some other large packages, double-check things, and in general step back & think a bit more about correctness, etc.). Depending how you count, it's effectively ~3-4 changes between the two CLs, and I haven't teased apart if one or more of those changes might not be useful.

That said, I have some cautious hope things can be sped up for the escape analysis piece, perhaps via something like these CLs, or perhaps via something else.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657077 mentions this issue: sweet: add typescript-go to go-build benchmark

@jakebailey
Copy link
Contributor

jakebailey commented Mar 13, 2025

Amazing job on those speedups! 5x would be so good.


I let my machine go through the git history and collect data on how long go build -a ./... takes to run over time. Forgive the unreadable text, but:

Image

So, it does seem somewhat organic. (Not that "organic" growth doesn't imply there aren't things to improve, obviously.)

However, that big cliff about near the centerish comes from microsoft/typescript-go@bcce040.

Which shows that this one commit increased escape analysis time in the checker package by nearly 20 seconds. That seems unusally large for the change made in that commit.

gopherbot pushed a commit to golang/benchmarks that referenced this issue Mar 13, 2025
Building this repository revealed some inefficiencies in the compiler.
This change adds the main command from this repository (tsgo) as a
benchmark of `go build -a` (a cold build) so we can track improvements
and hopefully catch any future regressions.

For golang/go#72815.

Change-Id: I8e01850b7956970000211cce50f200c3e38e54af
Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/657077
Reviewed-by: Carlos Amedee <[email protected]>
Auto-Submit: Michael Knyszek <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@dr2chase
Copy link
Contributor

dr2chase commented Mar 13, 2025

I was looking into whether it would make sense to run escape analysis in parallel, and in the process found that there is a strongly connected component of 1305 functions in the call graph. I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping, so with some work we could break that up into 1305 individual functions, which might also save time and would also reduce the wall time, if not the user time.

Here's the beginning of the list:

github.com/microsoft/typescript-go/internal/checker.(*Checker).resolveEntityName,
(*Checker).resolveQualifiedName,
(*Checker).getSymbol,
(*Checker).getSymbolFlagsEx,
(*Checker).getTypeOnlyAliasDeclarationEx,
(*Checker).resolveSymbolEx,
(*Checker).resolveAlias,
(*Checker).getTargetOfAliasDeclaration,
(*Checker).getTargetOfImportEqualsDeclaration,
(*Checker).resolveExternalModuleTypeByLiteral,
and 1295 more

@Jorropo
Copy link
Member Author

Jorropo commented Mar 13, 2025

If my english is not failing me I think the following sentence is wrong.

I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping

package a

type node struct {
	next *node
}

func stackAllocatedLinkedList(prev *node, budget uint) {
	if budget == 0 {
		return
	}
	someOtherFunction(&node{prev}, budget-1)
}

//go:noinline
func someOtherFunction(prev *node, budget uint) {
	stackAllocatedLinkedList(&node{prev}, budget-1)
}

See how it creates a linked list across the stack frames:

    00010 (+11) MOVQ AX, command-line-arguments..autotmp_3-8(SP)
    00011 (11) DECQ BX
    00012 (11) LEAQ command-line-arguments..autotmp_3-8(SP), AX
    00013 (11) PCDATA $1, $1
    # live at call to someOtherFunction:
    00014 (11) CALL command-line-arguments.someOtherFunction(SB)

I don't know how much real world code this optimization helps.

@mcy
Copy link

mcy commented Mar 14, 2025

I don't know how much real world code this optimization helps.

A lot of high-throughput compiler-flavored code depends on this. The example here is contrived. A better example would be a recursive function that uses some kind of cursor type to walk levels of a tree (e.g. imagine walking a btree). If each step needs to create a new cursor and pass it to the callee, each cursor will now wind up on the heap. This will also trigger if the callee needs to be passed e.g. a mere int out parameter.

Any graph walking algorithm that was previously allocation-free will now allocate in every frame, generating a lot of surprise garbage.

I wouldn't be surprised if this made the go compiler itself slower on average, due to missed optimizations when compiling itself...

@dr2chase
Copy link
Contributor

@Jorropo thanks very much for checking that. 10 years ago, escape analysis didn't do this, @mdempsky's rewrite made it better. There may still be some domain-specific hacks to make this faster, though.

@thepudds
Copy link
Contributor

FWIW, I suspect/hope we can run most of the work of escape analysis in parallel.

I now have a WIP parallel version of escape analysis where it runs the inner loop concurrently. (It does not partition the data-flow graph, which might be tricky to do without affecting the quality of the results. It instead sets things up to be able to run multiple walkOne in parallel over the original data-flow graph and merges the results).

Here are some tentative numbers for the typescript compiler's checker package with the WIP parallel version:

                 escape analysis    total package compile time

go1.24.1           47.109 sec         50.60 sec
cl-657179-ps3       8.095 sec         11.57 sec
parallel-wip        2.844 sec          6.28 sec

The first row is Go 1.24, the second row is the two earlier CLs I sent, and the third row is giving 8 logical cores to the WIP parallel version. (This is a different test VM than my numbers above). Running it with more cores would probably make it a bit faster, but there are likely diminishing returns.

The timing here is via -gcflags=-bench=<file>.txt, such as:

$ go build -a -v -gcflags='-bench=parallel-8-core.bench' github.com/microsoft/typescript-go/internal/checker

There are definitely some caveats, especially for the parallel version. It passes my local tests, but I have not tried the trybots. Also, I took a couple of shortcuts, including I temporarily commented out the code for the detailed debug logging of path-based diagnostic reports (available via -gcflags=-m=2), but I suspect I can resolve those, and the more basic debug logging (via -gcflags=-m=1) does seem to still work.

Also, note the times here are for compiling the typescript compiler's very large checker package and not an entire "from scratch" build of the typescript compiler. (The checker package was the long pole overall and it is the main subject of this issue here).

At this point, I plan to try to get the parallel work and the other CLs into reviewable shape, and we can see whether any of this is worthwhile vs. maybe there's a better direction, or maybe this is all flawed. 😅

@jakebailey
Copy link
Contributor

Getting in even what's already sent would be amazing; I've so far given people instructions on how to gotip download your CLs to make our dev experience better and it's been working out pretty well.

I was going to say "it's a shame we didn't announce the port earlier so we could report these", but I don't think this was a problem back before the freeze so it wouldn't have mattered 😅

@mdempsky
Copy link
Contributor

Thanks for the test case.

@jakebailey
Copy link
Contributor

@thepudds It's been a bit, but is there a chance your existing CLs can be unmarked as WIP so they could make into tip? I don't think you have sent the concurrent change yet, but having what's sent already would be pretty helpful.

Some of us are using your CL for local dev, but since it's out of date relative to tip, it's missing DWARF5 fixes (due to when it was sent) and other stuff that make it less fun to use outside plain go build/test.

@thepudds
Copy link
Contributor

thepudds commented Apr 7, 2025

Hi @jakebailey, I rebased the first two CLs on master. The first CL will probably pop out of review soon. I might simplify the second CL a bit more, but hopefully that won't be too long either. But now if you do gotip download 657179 again, you should get the recent commits on tip plus the escape analyis changes.

gopherbot pushed a commit that referenced this issue Apr 21, 2025
…of walkOne

Broadly speaking, escape analysis has two main phases. First, it
traverses the AST while building a data-flow graph of locations and
edges. Second, during "solve", it repeatedly walks the data-flow graph
while carefully propagating information about each location, including
whether a location's address reaches the heap.

Once escape analysis is in the solve phase and repeatedly walking the
data-flow graph, almost all the information it needs is within the
location graph, with a notable exception being the ir.Class of an
ir.Name, which currently must be checked by following a pointer from
the location to its ir.Node.

For typical graphs, that does not matter much, but if the graph becomes
large enough, cache misses in the inner solve loop start to matter more,
and the class is checked many times in the inner loop.

We therefore store the class information on the location in the graph
to reduce how much memory we need to load in the inner loop.

The package github.com/microsoft/typescript-go/internal/checker
has many locations, and compilation currently spends most of its time
in escape analysis.

This CL gives roughly a 30% speedup for wall clock compilation time
for the checker package:

  go1.24.0:      91.79s
  this CL:       64.98s

Linux perf shows a healthy reduction for example in l2_request.miss and
dTLB-load-misses on an amd64 test VM.

We could tweak things a bit more, though initial review feedback
has suggested it would be good to get this in as it stands.

Subsequent CLs in this stack give larger improvements.

Updates #72815

Change-Id: I3117430dff684c99e6da1e0d7763869873379238
Reviewed-on: https://go-review.googlesource.com/c/go/+/657295
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Jake Bailey <[email protected]>
Reviewed-by: David Chase <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Other None of the above. Performance
Projects
Development

No branches or pull requests

9 participants