Skip to content

Commit 060897b

Browse files
committed
cmd/compile: add initial backend concurrency support
This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set -c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is easily the most heavily contended mutex. I hope that the syncmap proposed in golang#18177 may provide some free speed-ups here. Some lookups may also be removable. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal, aside from some mild awkwardness to avoid deadlocks due to recursive calls. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. * types.Sym.Lsymmu. Syms keep a cache of their associated LSym, to reduce lookups in ctxt.Hash. This cache itself needs concurrency protection. This mutex adds to the size of a moderately important data structure, but the alloc benchmarks below show that this doesn't hurt much in practice. It is moderately contended, mostly because when lookups fail, the lock is held while vying for the contended ctxt.Hash mutex. The fact that it keeps load off the ctxt.Hash mutex, though, makes this mutex worth keeping. Also, a fair number of calls to Linksym could be avoided by judicious addition of a local variable. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed golang#19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is golang#19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates golang#15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
1 parent 5280dfb commit 060897b

File tree

14 files changed

+184
-21
lines changed

14 files changed

+184
-21
lines changed

src/cmd/compile/internal/gc/dcl.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1063,6 +1063,16 @@ func funcsymname(s *types.Sym) string {
10631063

10641064
// funcsym returns s·f.
10651065
func funcsym(s *types.Sym) *types.Sym {
1066+
// funcsymsmu here serves to protect not just mutations of funcsyms (below),
1067+
// but also the package lookup of the func sym name,
1068+
// since this function gets called concurrently from the backend.
1069+
// There are no other concurrent package lookups in the backend,
1070+
// except for the types package, which is protected separately.
1071+
// Reusing funcsymsmu to also cover this package lookup
1072+
// avoids a general, broader, expensive package lookup mutex.
1073+
// Note makefuncsym also does package look-up of func sym names,
1074+
// but that it is only called serially, from the front end.
1075+
funcsymsmu.Lock()
10661076
sf, existed := s.Pkg.LookupOK(funcsymname(s))
10671077
// Don't export s·f when compiling for dynamic linking.
10681078
// When dynamically linking, the necessary function
@@ -1071,6 +1081,7 @@ func funcsym(s *types.Sym) *types.Sym {
10711081
if !Ctxt.Flag_dynlink && !existed {
10721082
funcsyms = append(funcsyms, s)
10731083
}
1084+
funcsymsmu.Unlock()
10741085
return sf
10751086
}
10761087

src/cmd/compile/internal/gc/go.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"cmd/internal/bio"
1111
"cmd/internal/obj"
1212
"cmd/internal/src"
13+
"sync"
1314
)
1415

1516
const (
@@ -169,7 +170,10 @@ var exportlist []*Node
169170

170171
var importlist []*Node // imported functions and methods with inlinable bodies
171172

172-
var funcsyms []*types.Sym
173+
var (
174+
funcsymsmu sync.Mutex // protects funcsyms and associated package lookups (see func funcsym)
175+
funcsyms []*types.Sym
176+
)
173177

174178
var dclcontext Class // PEXTERN/PAUTO
175179

src/cmd/compile/internal/gc/gsubr.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,12 @@ type Progs struct {
5151
}
5252

5353
// newProgs returns a new Progs for fn.
54-
func newProgs(fn *Node) *Progs {
54+
// worker indicates which of the backend workers will use the Progs.
55+
func newProgs(fn *Node, worker int) *Progs {
5556
pp := new(Progs)
5657
if Ctxt.CanReuseProgs() {
57-
pp.progcache = sharedProgArray[:]
58+
sz := len(sharedProgArray) / nBackendWorkers
59+
pp.progcache = sharedProgArray[sz*worker : sz*(worker+1)]
5860
}
5961
pp.curfn = fn
6062

src/cmd/compile/internal/gc/main.go

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,7 @@ func Main(archInit func(*Arch)) {
184184
objabi.Flagcount("W", "debug parse tree after type checking", &Debug['W'])
185185
flag.StringVar(&asmhdr, "asmhdr", "", "write assembly header to `file`")
186186
flag.StringVar(&buildid, "buildid", "", "record `id` as the build id in the export metadata")
187+
flag.IntVar(&nBackendWorkers, "c", 1, "concurrency during compilation, 1 means no concurrency")
187188
flag.BoolVar(&pure_go, "complete", false, "compiling complete package (no C or assembly)")
188189
flag.StringVar(&debugstr, "d", "", "print debug information about items in `list`; try -d help")
189190
flag.BoolVar(&flagDWARF, "dwarf", true, "generate DWARF symbols")
@@ -279,6 +280,12 @@ func Main(archInit func(*Arch)) {
279280
if compiling_runtime && Debug['N'] != 0 {
280281
log.Fatal("cannot disable optimizations while compiling runtime")
281282
}
283+
if nBackendWorkers < 1 {
284+
log.Fatalf("-c must be at least 1, got %d", nBackendWorkers)
285+
}
286+
if nBackendWorkers > 1 && !concurrentBackendAllowed() {
287+
log.Fatalf("cannot use concurrent backend compilation with provided flags; invoked as %v", os.Args)
288+
}
282289

283290
// parse -d argument
284291
if debugstr != "" {
@@ -571,9 +578,23 @@ func Main(archInit func(*Arch)) {
571578
fninit(xtop)
572579
}
573580

581+
compileFunctions()
582+
583+
// We autogenerate and compile some small functions
584+
// such as method wrappers and equality/hash routines
585+
// while exporting code.
586+
// Disable concurrent compilation from here on,
587+
// at least until this convoluted structure has been unwound.
588+
nBackendWorkers = 1
589+
574590
if compiling_runtime {
575591
checknowritebarrierrec()
576592
}
593+
594+
// Check whether any of the functions we have compiled have gigantic stack frames.
595+
obj.SortSlice(largeStackFrames, func(i, j int) bool {
596+
return largeStackFrames[i].Before(largeStackFrames[j])
597+
})
577598
for _, largePos := range largeStackFrames {
578599
yyerrorl(largePos, "stack frame too large (>2GB)")
579600
}
@@ -598,6 +619,10 @@ func Main(archInit func(*Arch)) {
598619
dumpasmhdr()
599620
}
600621

622+
if len(compilequeue) != 0 {
623+
Fatalf("%d uncompiled functions", len(compilequeue))
624+
}
625+
601626
if nerrors+nsavederrors != 0 {
602627
errorexit()
603628
}
@@ -1046,3 +1071,37 @@ func clearImports() {
10461071
func IsAlias(sym *types.Sym) bool {
10471072
return sym.Def != nil && asNode(sym.Def).Sym != sym
10481073
}
1074+
1075+
// By default, assume any debug flags are incompatible with concurrent compilation.
1076+
// A few are safe and potentially in common use for normal compiles, though; mark them as such here.
1077+
var concurrentFlagOK = [256]bool{
1078+
'B': true, // disabled bounds checking
1079+
'C': true, // disable printing of columns in error messages
1080+
'I': true, // add `directory` to import search path
1081+
'N': true, // disable optimizations
1082+
'l': true, // disable inlining
1083+
}
1084+
1085+
func concurrentBackendAllowed() bool {
1086+
for i, x := range Debug {
1087+
if x != 0 && !concurrentFlagOK[i] {
1088+
return false
1089+
}
1090+
}
1091+
// Debug_asm by itself is ok, because all printing occurs
1092+
// while writing the object file, and that is non-concurrent.
1093+
// Adding Debug_vlog, however, causes Debug_asm to also print
1094+
// while flushing the plist, which happens concurrently.
1095+
if Debug_vlog || debugstr != "" || debuglive > 0 {
1096+
return false
1097+
}
1098+
// TODO: test and add builders for GOEXPERIMENT values, and enable
1099+
if os.Getenv("GOEXPERIMENT") != "" {
1100+
return false
1101+
}
1102+
// TODO: fix races and enable the following flags
1103+
if Ctxt.Flag_shared || Ctxt.Flag_dynlink || flag_race {
1104+
return false
1105+
}
1106+
return true
1107+
}

src/cmd/compile/internal/gc/obj.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,9 @@ func dumpglobls() {
220220
ggloblnod(n)
221221
}
222222

223+
obj.SortSlice(funcsyms, func(i, j int) bool {
224+
return funcsyms[i].LinksymName() < funcsyms[j].LinksymName()
225+
})
223226
for _, s := range funcsyms {
224227
sf := s.Pkg.Lookup(funcsymname(s)).Linksym()
225228
dsymptr(sf, 0, s.Linksym(), 0)

src/cmd/compile/internal/gc/pgen.go

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,16 @@ import (
1414
"cmd/internal/sys"
1515
"fmt"
1616
"sort"
17+
"sync"
1718
)
1819

1920
// "Portable" code generation.
2021

22+
var (
23+
nBackendWorkers int // number of concurrent backend workers, set by a compiler flag
24+
compilequeue []*Node // functions waiting to be compiled
25+
)
26+
2127
func emitptrargsmap() {
2228
if Curfn.Func.Nname.Sym.Name == "_" {
2329
return
@@ -207,20 +213,66 @@ func compile(fn *Node) {
207213
// Set up the function's LSym early to avoid data races with the assemblers.
208214
fn.Func.initLSym()
209215

210-
// Build an SSA backend function.
211-
ssafn := buildssa(fn)
212-
pp := newProgs(fn)
216+
if compilenow() {
217+
compileSSA(fn, 0)
218+
} else {
219+
compilequeue = append(compilequeue, fn)
220+
}
221+
}
222+
223+
// compilenow reports whether to compile immediately.
224+
// If functions are not compiled immediately,
225+
// they are enqueued in compilequeue,
226+
// which is drained by compileFunctions.
227+
func compilenow() bool {
228+
return nBackendWorkers == 1
229+
}
230+
231+
// compileSSA builds an SSA backend function,
232+
// uses it to generate a plist,
233+
// and flushes that plist to machine code.
234+
// worker indicates which of the backend workers is doing the processing.
235+
func compileSSA(fn *Node, worker int) {
236+
ssafn := buildssa(fn, worker)
237+
pp := newProgs(fn, worker)
213238
genssa(ssafn, pp)
214239
if pp.Text.To.Offset < 1<<31 {
215240
pp.Flush()
216241
} else {
242+
largeStackFramesMu.Lock()
217243
largeStackFrames = append(largeStackFrames, fn.Pos)
244+
largeStackFramesMu.Unlock()
218245
}
219246
// fieldtrack must be called after pp.Flush. See issue 20014.
220247
fieldtrack(pp.Text.From.Sym, fn.Func.FieldTrack)
221248
pp.Free()
222249
}
223250

251+
// compileFunctions compiles all functions in compilequeue.
252+
// It fans out nBackendWorkers to do the work
253+
// and waits for them to complete.
254+
func compileFunctions() {
255+
if len(compilequeue) != 0 {
256+
var wg sync.WaitGroup
257+
c := make(chan *Node)
258+
for i := 0; i < nBackendWorkers; i++ {
259+
wg.Add(1)
260+
go func(worker int) {
261+
for fn := range c {
262+
compileSSA(fn, worker)
263+
}
264+
wg.Done()
265+
}(i)
266+
}
267+
for _, fn := range compilequeue {
268+
c <- fn
269+
}
270+
close(c)
271+
compilequeue = nil
272+
wg.Wait()
273+
}
274+
}
275+
224276
func debuginfo(fnsym *obj.LSym, curfn interface{}) []*dwarf.Var {
225277
fn := curfn.(*Node)
226278
if expect := fn.Func.Nname.Sym.Linksym(); fnsym != expect {

src/cmd/compile/internal/gc/reflect.go

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import (
1414
"os"
1515
"sort"
1616
"strings"
17+
"sync"
1718
)
1819

1920
type itabEntry struct {
@@ -36,9 +37,13 @@ type ptabEntry struct {
3637
}
3738

3839
// runtime interface and reflection data structures
39-
var signatlist = make(map[*types.Type]bool)
40-
var itabs []itabEntry
41-
var ptabs []ptabEntry
40+
var (
41+
signatlistmu sync.Mutex // protects signatlist
42+
signatlist = make(map[*types.Type]bool)
43+
44+
itabs []itabEntry
45+
ptabs []ptabEntry
46+
)
4247

4348
type Sig struct {
4449
name string
@@ -929,7 +934,9 @@ func typenamesym(t *types.Type) *types.Sym {
929934
Fatalf("typenamesym %v", t)
930935
}
931936
s := typesym(t)
937+
signatlistmu.Lock()
932938
addsignat(t)
939+
signatlistmu.Unlock()
933940
return s
934941
}
935942

src/cmd/compile/internal/gc/ssa.go

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ import (
2121
)
2222

2323
var ssaConfig *ssa.Config
24-
var ssaCache *ssa.Cache
24+
var ssaCaches []ssa.Cache
2525

2626
func initssaconfig() {
2727
types_ := ssa.Types{
@@ -67,7 +67,7 @@ func initssaconfig() {
6767
if thearch.LinkArch.Name == "386" {
6868
ssaConfig.Set387(thearch.Use387)
6969
}
70-
ssaCache = new(ssa.Cache)
70+
ssaCaches = make([]ssa.Cache, nBackendWorkers)
7171

7272
// Set up some runtime functions we'll need to call.
7373
Newproc = Sysfunc("newproc")
@@ -94,8 +94,9 @@ func initssaconfig() {
9494
Udiv = Sysfunc("udiv")
9595
}
9696

97-
// buildssa builds an SSA function.
98-
func buildssa(fn *Node) *ssa.Func {
97+
// buildssa builds an SSA function for fn.
98+
// worker indicates which of the backend workers is doing the processing.
99+
func buildssa(fn *Node, worker int) *ssa.Func {
99100
name := fn.Func.Nname.Sym.Name
100101
printssa := name == os.Getenv("GOSSAFUNC")
101102
if printssa {
@@ -123,7 +124,7 @@ func buildssa(fn *Node) *ssa.Func {
123124
s.f = ssa.NewFunc(&fe)
124125
s.config = ssaConfig
125126
s.f.Config = ssaConfig
126-
s.f.Cache = ssaCache
127+
s.f.Cache = &ssaCaches[worker]
127128
s.f.Cache.Reset()
128129
s.f.DebugTest = s.f.DebugHashMatch("GOSSAHASH", name)
129130
s.f.Name = name

src/cmd/compile/internal/gc/subr.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import (
1616
"sort"
1717
"strconv"
1818
"strings"
19+
"sync"
1920
"unicode"
2021
"unicode/utf8"
2122
)
@@ -27,7 +28,10 @@ type Error struct {
2728

2829
var errors []Error
2930

30-
var largeStackFrames []src.XPos // positions of functions whose stack frames are too large (rare)
31+
var (
32+
largeStackFramesMu sync.Mutex // protects largeStackFrames
33+
largeStackFrames []src.XPos // positions of functions whose stack frames are too large (rare)
34+
)
3135

3236
func errorexit() {
3337
flusherrors()

src/cmd/compile/internal/types/pkg.go

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import (
99
"cmd/internal/objabi"
1010
"fmt"
1111
"sort"
12+
"sync"
1213
)
1314

1415
// pkgMap maps a package path to a package.
@@ -69,10 +70,16 @@ var nopkg = &Pkg{
6970
}
7071

7172
// fake package for runtime type info (headers)
72-
var typepkg = NewPkg("type", "type")
73+
var (
74+
typepkgmu sync.Mutex // protects typepkg lookups
75+
typepkg = NewPkg("type", "type")
76+
)
7377

7478
func TypePkgLookup(name string) *Sym {
75-
return typepkg.Lookup(name)
79+
typepkgmu.Lock()
80+
s := typepkg.Lookup(name)
81+
typepkgmu.Unlock()
82+
return s
7683
}
7784

7885
func (pkg *Pkg) Lookup(name string) *Sym {
@@ -115,14 +122,19 @@ func (pkg *Pkg) LookupBytes(name []byte) *Sym {
115122
return pkg.Lookup(str)
116123
}
117124

118-
var internedStrings = map[string]string{}
125+
var (
126+
internedStringsmu sync.Mutex // protects internedStrings
127+
internedStrings = map[string]string{}
128+
)
119129

120130
func InternString(b []byte) string {
131+
internedStringsmu.Lock()
121132
s, ok := internedStrings[string(b)] // string(b) here doesn't allocate
122133
if !ok {
123134
s = string(b)
124135
internedStrings[s] = s
125136
}
137+
internedStringsmu.Unlock()
126138
return s
127139
}
128140

0 commit comments

Comments
 (0)