Skip to content

Commit 225ba0f

Browse files
committed
cmd/compile: add initial backend concurrency support
This CL adds initial support for concurrent backend compilation. CAVEATS I suspect it's going to end up on Twitter. If you're coming here from the internet: * Don't believe the hype. * If you're going to try it out, please also try with the race detector enabled and report an issue at golang.org/issue/new if you see a race report. Just run 'go install -race cmd/compile' and 'go build -a yourpackages'. And then run make.bash again to return a reasonable compiler. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise let in the development cycle, we can simply have cmd/go set -c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. It requires additional sorting to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is easily the most heavily contended mutex. I hope that the syncmap proposed in golang#18177 may provide some free speed-ups here. Some lookups may also be removable. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal, aside from some mild awkwardness to avoid deadlocks due to recursive calls. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. * types.Sym.Lsymmu. Syms keep a cache of their associated LSym, to reduce lookups in ctxt.Hash. This cache itself needs concurrency protection. This mutex adds to the size of a moderately important data structure, but the alloc benchmarks below show that this doesn't hurt much in practice. It is moderately contended, mostly because when lookups fail, the lock is held while vying for the contended ctxt.Hash mutex. The fact that it keeps load off the ctxt.Hash mutex, though, makes this mutex worth keeping. Also, a fair number of calls to Linksym could be avoided by judicious addition of a local variable. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed golang#19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is golang#19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 costs about 1-2% CPU and has almost no memory impact. name old time/op new time/op delta Template 193ms ± 4% 195ms ± 5% +0.99% (p=0.007 n=50+50) Unicode 82.0ms ± 3% 84.7ms ± 4% +3.26% (p=0.000 n=47+48) GoTypes 539ms ± 4% 549ms ± 3% +1.90% (p=0.000 n=46+46) SSA 5.92s ± 2% 5.95s ± 3% +0.49% (p=0.018 n=42+48) Flate 121ms ± 4% 122ms ± 3% ~ (p=0.783 n=48+46) GoParser 143ms ± 6% 144ms ± 3% +0.90% (p=0.001 n=49+47) Reflect 342ms ± 4% 347ms ± 3% +1.47% (p=0.000 n=48+49) Tar 104ms ± 5% 105ms ± 4% +1.06% (p=0.003 n=47+47) XML 197ms ± 4% 199ms ± 5% +0.85% (p=0.014 n=48+49) name old user-time/op new user-time/op delta Template 238ms ±10% 241ms ±10% ~ (p=0.066 n=50+50) Unicode 104ms ± 4% 106ms ± 5% +2.23% (p=0.000 n=46+49) GoTypes 706ms ± 3% 714ms ± 5% +1.15% (p=0.002 n=45+49) SSA 8.21s ± 3% 8.27s ± 2% +0.78% (p=0.002 n=43+47) Flate 144ms ± 7% 145ms ± 5% ~ (p=0.098 n=49+49) GoParser 175ms ± 4% 177ms ± 4% +1.35% (p=0.003 n=47+50) Reflect 435ms ± 6% 433ms ± 7% ~ (p=0.852 n=50+50) Tar 121ms ± 6% 122ms ± 7% ~ (p=0.143 n=48+50) XML 240ms ± 4% 241ms ± 4% ~ (p=0.231 n=50+49) name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.7MB ± 0% ~ (p=0.690 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.222 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% ~ (p=0.310 n=5+5) SSA 1.24GB ± 0% 1.24GB ± 0% +0.04% (p=0.008 n=5+5) Flate 25.2MB ± 0% 25.2MB ± 0% ~ (p=0.841 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% ~ (p=0.151 n=5+5) Reflect 77.5MB ± 0% 77.5MB ± 0% ~ (p=0.690 n=5+5) Tar 26.4MB ± 0% 26.4MB ± 0% ~ (p=0.690 n=5+5) XML 42.0MB ± 0% 42.0MB ± 0% ~ (p=0.690 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.841 n=5+5) Unicode 321k ± 0% 322k ± 0% ~ (p=0.310 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.310 n=5+5) SSA 9.67M ± 0% 9.69M ± 0% +0.20% (p=0.008 n=5+5) Flate 233k ± 0% 233k ± 1% ~ (p=1.000 n=5+5) GoParser 315k ± 0% 314k ± 1% ~ (p=0.151 n=5+5) Reflect 971k ± 0% 970k ± 0% ~ (p=0.690 n=5+5) Tar 249k ± 0% 249k ± 0% ~ (p=0.310 n=5+5) XML 391k ± 0% 390k ± 0% ~ (p=0.841 n=5+5) Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates golang#15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
1 parent 4646a33 commit 225ba0f

File tree

14 files changed

+192
-26
lines changed

14 files changed

+192
-26
lines changed

src/cmd/compile/internal/gc/dcl.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1063,6 +1063,8 @@ func funcsymname(s *types.Sym) string {
10631063

10641064
// funcsym returns s·f.
10651065
func funcsym(s *types.Sym) *types.Sym {
1066+
// Lock funcsymsmu immediately, so that it can also guard the package lookup.
1067+
funcsymsmu.Lock()
10661068
sf, existed := s.Pkg.LookupOK(funcsymname(s))
10671069
// Don't export s·f when compiling for dynamic linking.
10681070
// When dynamically linking, the necessary function
@@ -1071,6 +1073,7 @@ func funcsym(s *types.Sym) *types.Sym {
10711073
if !Ctxt.Flag_dynlink && !existed {
10721074
funcsyms = append(funcsyms, s)
10731075
}
1076+
funcsymsmu.Unlock()
10741077
return sf
10751078
}
10761079

@@ -1096,7 +1099,9 @@ func makefuncsym(s *types.Sym) {
10961099
return
10971100
}
10981101
if _, existed := s.Pkg.LookupOK(funcsymname(s)); !existed {
1102+
funcsymsmu.Lock()
10991103
funcsyms = append(funcsyms, s)
1104+
funcsymsmu.Unlock()
11001105
}
11011106
}
11021107

src/cmd/compile/internal/gc/go.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"cmd/internal/bio"
1111
"cmd/internal/obj"
1212
"cmd/internal/src"
13+
"sync"
1314
)
1415

1516
const (
@@ -171,7 +172,10 @@ var exportlist []*Node
171172

172173
var importlist []*Node // imported functions and methods with inlinable bodies
173174

174-
var funcsyms []*types.Sym
175+
var (
176+
funcsymsmu sync.Mutex // protects funcsyms and associated package lookups (see func funcsym)
177+
funcsyms []*types.Sym
178+
)
175179

176180
var dclcontext Class // PEXTERN/PAUTO
177181

src/cmd/compile/internal/gc/gsubr.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,12 @@ type Progs struct {
5555
}
5656

5757
// newProgs returns a new Progs for fn.
58-
func newProgs(fn *Node) *Progs {
58+
// worker indicates which of the backend workers will use the Progs.
59+
func newProgs(fn *Node, worker int) *Progs {
5960
pp := new(Progs)
6061
if Ctxt.CanReuseProgs() {
61-
pp.progcache = sharedProgArray[:]
62+
sz := len(sharedProgArray) / nBackendWorkers
63+
pp.progcache = sharedProgArray[sz*worker : sz*(worker+1)]
6264
}
6365
pp.curfn = fn
6466

src/cmd/compile/internal/gc/main.go

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ func Main(archInit func(*Arch)) {
172172
objabi.Flagcount("W", "debug parse tree after type checking", &Debug['W'])
173173
flag.StringVar(&asmhdr, "asmhdr", "", "write assembly header to `file`")
174174
flag.StringVar(&buildid, "buildid", "", "record `id` as the build id in the export metadata")
175+
flag.IntVar(&nBackendWorkers, "c", 1, "number of concurrent backend compilations")
175176
flag.BoolVar(&pure_go, "complete", false, "compiling complete package (no C or assembly)")
176177
flag.StringVar(&debugstr, "d", "", "print debug information about items in `list`")
177178
flag.BoolVar(&flagDWARF, "dwarf", true, "generate DWARF symbols")
@@ -267,6 +268,12 @@ func Main(archInit func(*Arch)) {
267268
if compiling_runtime && Debug['N'] != 0 {
268269
log.Fatal("cannot disable optimizations while compiling runtime")
269270
}
271+
if nBackendWorkers < 1 {
272+
log.Fatalf("-c must be at least 1, got %d", nBackendWorkers)
273+
}
274+
if nBackendWorkers > 1 && !concurrentBackendAllowed() {
275+
log.Fatalf("cannot use concurrent backend compilation with provided flags; invoked as %v", os.Args)
276+
}
270277

271278
// parse -d argument
272279
if debugstr != "" {
@@ -540,13 +547,31 @@ func Main(archInit func(*Arch)) {
540547
}
541548
timings.AddEvent(fcount, "funcs")
542549

550+
if nBackendWorkers > 1 {
551+
for _, fn := range needscompile {
552+
compilec <- fn
553+
}
554+
close(compilec)
555+
needscompile = nil
556+
compilewg.Wait()
557+
}
558+
// We autogenerate and compile some small functions
559+
// such as method wrappers and equality/hash routines
560+
// while exporting code.
561+
// Disable concurrent compilation from here on,
562+
// at least until this convoluted structure has been unwound.
563+
nBackendWorkers = 1
564+
543565
if nsavederrors+nerrors == 0 {
544566
fninit(xtop)
545567
}
546568

547569
if compiling_runtime {
548570
checknowritebarrierrec()
549571
}
572+
obj.SortSlice(largeStackFrames, func(i, j int) bool {
573+
return largeStackFrames[i].Before(largeStackFrames[j])
574+
})
550575
for _, largePos := range largeStackFrames {
551576
yyerrorl(largePos, "stack frame too large (>2GB)")
552577
}
@@ -1019,3 +1044,37 @@ func clearImports() {
10191044
func IsAlias(sym *types.Sym) bool {
10201045
return sym.Def != nil && asNode(sym.Def).Sym != sym
10211046
}
1047+
1048+
// By default, assume any debug flags are incompatible with concurrent compilation.
1049+
// A few are safe and potentially in common use for normal compiles, though; mark them as such here.
1050+
var concurrentFlagOK = [256]bool{
1051+
'B': true, // disabled bounds checking
1052+
'C': true, // disable printing of columns in error messages
1053+
'I': true, // add `directory` to import search path
1054+
'N': true, // disable optimizations
1055+
'l': true, // disable inlining
1056+
}
1057+
1058+
func concurrentBackendAllowed() bool {
1059+
for i, x := range Debug {
1060+
if x != 0 && !concurrentFlagOK[i] {
1061+
return false
1062+
}
1063+
}
1064+
// Debug_asm by itself is ok, because all printing occurs
1065+
// while writing the object file, and that is non-concurrent.
1066+
// Adding Debug_vlog, however, causes Debug_asm to also print
1067+
// while flushing the plist, which happens concurrently.
1068+
if Debug_vlog || debugstr != "" || debuglive > 0 {
1069+
return false
1070+
}
1071+
// TODO: test and add builders for GOEXPERIMENT values, and enable
1072+
if os.Getenv("GOEXPERIMENT") != "" {
1073+
return false
1074+
}
1075+
// TODO: fix races and enable the following flags
1076+
if Ctxt.Flag_shared || Ctxt.Flag_dynlink || flag_race {
1077+
return false
1078+
}
1079+
return true
1080+
}

src/cmd/compile/internal/gc/obj.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,11 @@ func dumpglobls() {
220220
ggloblnod(n)
221221
}
222222

223+
funcsymsmu.Lock()
224+
defer funcsymsmu.Unlock()
225+
obj.SortSlice(funcsyms, func(i, j int) bool {
226+
return linksymname(funcsyms[i]) < linksymname(funcsyms[j])
227+
})
223228
for _, s := range funcsyms {
224229
sf := s.Pkg.Lookup(funcsymname(s))
225230
dsymptr(sf, 0, s, 0)
@@ -265,9 +270,11 @@ func Linksym(s *types.Sym) *obj.LSym {
265270
if s == nil {
266271
return nil
267272
}
273+
s.Lsymmu.Lock()
268274
if s.Lsym == nil {
269275
s.Lsym = Ctxt.Lookup(linksymname(s), 0)
270276
}
277+
s.Lsymmu.Unlock()
271278
return s.Lsym
272279
}
273280

src/cmd/compile/internal/gc/pgen.go

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,18 @@ import (
1414
"cmd/internal/sys"
1515
"fmt"
1616
"sort"
17+
"sync"
1718
)
1819

1920
// "Portable" code generation.
2021

22+
var (
23+
nBackendWorkers int // the number of concurrent backend workers, set by a compiler flag
24+
needscompile []*Node // slice of functions waiting to be compiled
25+
compilewg sync.WaitGroup // wait for all backend compilers to complete
26+
compilec chan *Node // channel of functions for backend compilers to drain
27+
)
28+
2129
func emitptrargsmap() {
2230
if Curfn.Func.Nname.Sym.Name == "_" {
2331
return
@@ -207,14 +215,32 @@ func compile(fn *Node) {
207215
// Set up the function's LSym early to avoid data races with the assemblers.
208216
fn.Func.initLSym()
209217

210-
// Build an SSA backend function.
211-
ssafn := buildssa(fn)
212-
pp := newProgs(fn)
218+
if compilenow() {
219+
compileSSA(fn, 0)
220+
} else {
221+
needscompile = append(needscompile, fn)
222+
}
223+
}
224+
225+
// compilenow reports whether to compile immediately or enqueue in needscompile.
226+
func compilenow() bool {
227+
return nBackendWorkers == 1
228+
}
229+
230+
// compileSSA builds an SSA backend function,
231+
// uses it to generate a plist,
232+
// and flushes that plist to machine code.
233+
// worker indicates which of the backend workers is doing the processing.
234+
func compileSSA(fn *Node, worker int) {
235+
ssafn := buildssa(fn, worker)
236+
pp := newProgs(fn, worker)
213237
genssa(ssafn, pp)
214238
if pp.Text.To.Offset < 1<<31 {
215239
pp.Flush()
216240
} else {
241+
largeStackFramesMu.Lock()
217242
largeStackFrames = append(largeStackFrames, fn.Pos)
243+
largeStackFramesMu.Unlock()
218244
}
219245
// fieldtrack must be called after pp.Flush. See issue 20014.
220246
fieldtrack(pp.Text.From.Sym, fn.Func.FieldTrack)

src/cmd/compile/internal/gc/reflect.go

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import (
1414
"os"
1515
"sort"
1616
"strings"
17+
"sync"
1718
)
1819

1920
type itabEntry struct {
@@ -36,9 +37,13 @@ type ptabEntry struct {
3637
}
3738

3839
// runtime interface and reflection data structures
39-
var signatlist = make(map[*types.Type]bool)
40-
var itabs []itabEntry
41-
var ptabs []ptabEntry
40+
var (
41+
signatlistmu sync.Mutex // protects signatlist
42+
signatlist = make(map[*types.Type]bool)
43+
44+
itabs []itabEntry
45+
ptabs []ptabEntry
46+
)
4247

4348
type Sig struct {
4449
name string
@@ -907,8 +912,18 @@ func typesymname(t *types.Type) string {
907912
return name
908913
}
909914

915+
// typepkgmu guards typepkg lookups.
916+
var typepkgmu sync.Mutex
917+
918+
func typeLookup(name string) *types.Sym {
919+
typepkgmu.Lock()
920+
s := typepkg.Lookup(name)
921+
typepkgmu.Unlock()
922+
return s
923+
}
924+
910925
func typesym(t *types.Type) *types.Sym {
911-
return typepkg.Lookup(typesymname(t))
926+
return typeLookup(typesymname(t))
912927
}
913928

914929
// tracksym returns the symbol for tracking use of field/method f, assumed
@@ -919,7 +934,7 @@ func tracksym(t *types.Type, f *types.Field) *types.Sym {
919934

920935
func typesymprefix(prefix string, t *types.Type) *types.Sym {
921936
p := prefix + "." + t.ShortString()
922-
s := typepkg.Lookup(p)
937+
s := typeLookup(p)
923938

924939
//print("algsym: %s -> %+S\n", p, s);
925940

@@ -931,7 +946,9 @@ func typenamesym(t *types.Type) *types.Sym {
931946
Fatalf("typenamesym %v", t)
932947
}
933948
s := typesym(t)
949+
signatlistmu.Lock()
934950
addsignat(t)
951+
signatlistmu.Unlock()
935952
return s
936953
}
937954

@@ -1421,14 +1438,17 @@ func addsignat(t *types.Type) {
14211438

14221439
func dumptypestructs() {
14231440
// copy types from externdcl list to signatlist
1441+
signatlistmu.Lock()
14241442
for _, n := range externdcl {
14251443
if n.Op == OTYPE {
14261444
addsignat(n.Type)
14271445
}
14281446
}
1447+
signatlistmu.Unlock()
14291448

14301449
// Process signatlist. Use a loop, as dtypesym adds
14311450
// entries to signatlist while it is being processed.
1451+
signatlistmu.Lock()
14321452
signats := make([]typeAndStr, len(signatlist))
14331453
for len(signatlist) > 0 {
14341454
signats = signats[:0]
@@ -1437,6 +1457,9 @@ func dumptypestructs() {
14371457
signats = append(signats, typeAndStr{t: t, s: typesymname(t)})
14381458
delete(signatlist, t)
14391459
}
1460+
// Don't hold signatlistmu while processing signats,
1461+
// since signats can generate new entries for signatlist.
1462+
signatlistmu.Unlock()
14401463
sort.Sort(typesByString(signats))
14411464
for _, ts := range signats {
14421465
t := ts.t
@@ -1445,7 +1468,9 @@ func dumptypestructs() {
14451468
dtypesym(types.NewPtr(t))
14461469
}
14471470
}
1471+
signatlistmu.Lock()
14481472
}
1473+
signatlistmu.Unlock()
14491474

14501475
// process itabs
14511476
for _, i := range itabs {
@@ -1560,7 +1585,7 @@ func dalgsym(t *types.Type) *types.Sym {
15601585
// we use one algorithm table for all AMEM types of a given size
15611586
p := fmt.Sprintf(".alg%d", t.Width)
15621587

1563-
s = typepkg.Lookup(p)
1588+
s = typeLookup(p)
15641589

15651590
if s.AlgGen() {
15661591
return s
@@ -1570,7 +1595,7 @@ func dalgsym(t *types.Type) *types.Sym {
15701595
// make hash closure
15711596
p = fmt.Sprintf(".hashfunc%d", t.Width)
15721597

1573-
hashfunc = typepkg.Lookup(p)
1598+
hashfunc = typeLookup(p)
15741599

15751600
ot := 0
15761601
ot = dsymptr(hashfunc, ot, Runtimepkg.Lookup("memhash_varlen"), 0)
@@ -1580,7 +1605,7 @@ func dalgsym(t *types.Type) *types.Sym {
15801605
// make equality closure
15811606
p = fmt.Sprintf(".eqfunc%d", t.Width)
15821607

1583-
eqfunc = typepkg.Lookup(p)
1608+
eqfunc = typeLookup(p)
15841609

15851610
ot = 0
15861611
ot = dsymptr(eqfunc, ot, Runtimepkg.Lookup("memequal_varlen"), 0)

0 commit comments

Comments
 (0)