Skip to content

Commit 52bd1c4

Browse files
rhyshmknyszek
authored andcommitted
runtime: decrease STW pause for goroutine profile
The goroutine profile needs to stop the world to get a consistent snapshot of all goroutines in the app. Leaving the world stopped while iterating over allgs leads to a pause proportional to the number of goroutines in the app (or its high-water mark). Instead, do only a fixed amount of bookkeeping while the world is stopped. Install a barrier so the scheduler confirms that a goroutine appears in the profile, with its stack recorded exactly as it was during the stop-the-world pause, before it allows that goroutine to execute. Iterate over allgs while the app resumes normal operations, adding each to the profile unless they've been scheduled in the meantime (and so have profiled themselves). Stop the world a second time to remove the barrier and do a fixed amount of cleanup work. This increases both the fixed overhead and per-goroutine CPU-time cost of GoroutineProfile. It also increases the wall-clock latency of the call to GoroutineProfile, since the scheduler may interrupt it to execute other goroutines. name old time/op new time/op delta GoroutineProfile/small/loaded-8 1.05ms ± 5% 4.99ms ±31% +376.85% (p=0.000 n=10+9) GoroutineProfile/sparse/loaded-8 1.04ms ± 4% 3.61ms ±27% +246.61% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.69ms ±17% 20.35ms ± 4% +164.50% (p=0.000 n=10+10) GoroutineProfile/small/idle 958µs ± 0% 1820µs ±23% +89.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00ms ± 3% 1.52ms ±17% +51.18% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01ms ± 4% 1.47ms ± 7% +45.28% (p=0.000 n=9+9) GoroutineProfile/sparse/idle 980µs ± 1% 1403µs ± 2% +43.22% (p=0.000 n=9+10) GoroutineProfile/large/idle-8 7.19ms ± 8% 8.43ms ±21% +17.22% (p=0.011 n=10+10) PingPongHog 511ns ± 8% 585ns ± 9% +14.39% (p=0.000 n=10+10) GoroutineProfile/large/idle 6.71ms ± 0% 7.58ms ± 3% +13.08% (p=0.000 n=8+10) PingPongHog-8 469ns ± 8% 509ns ±12% +8.62% (p=0.010 n=9+10) WakeupParallelSyscall/5µs 216µs ± 4% 229µs ± 3% +6.06% (p=0.000 n=10+9) WakeupParallelSyscall/5µs-8 147µs ± 1% 149µs ± 2% +1.12% (p=0.009 n=10+10) WakeupParallelSyscall/2µs-8 140µs ± 0% 142µs ± 1% +1.11% (p=0.001 n=10+9) WakeupParallelSyscall/50µs-8 236µs ± 0% 238µs ± 1% +1.08% (p=0.000 n=9+10) WakeupParallelSyscall/1µs-8 138µs ± 0% 140µs ± 1% +1.05% (p=0.013 n=10+9) Matmult 8.52ns ± 1% 8.61ns ± 0% +0.98% (p=0.002 n=10+8) WakeupParallelSyscall/10µs-8 157µs ± 1% 158µs ± 1% +0.58% (p=0.003 n=10+8) CreateGoroutinesSingle-8 328ns ± 0% 330ns ± 1% +0.57% (p=0.000 n=9+9) WakeupParallelSpinning/100µs-8 343µs ± 0% 344µs ± 1% +0.30% (p=0.015 n=8+8) WakeupParallelSyscall/20µs-8 178µs ± 0% 178µs ± 0% +0.18% (p=0.043 n=10+9) StackGrowthDeep-8 22.8µs ± 0% 22.9µs ± 0% +0.12% (p=0.006 n=10+10) StackGrowth 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=8+9) WakeupParallelSpinning/0s 10.7µs ± 0% 10.7µs ± 0% +0.08% (p=0.000 n=9+9) WakeupParallelSpinning/5µs 30.7µs ± 0% 30.7µs ± 0% +0.04% (p=0.000 n=10+10) WakeupParallelSpinning/100µs 411µs ± 0% 411µs ± 0% +0.03% (p=0.000 n=10+9) WakeupParallelSpinning/2µs 18.7µs ± 0% 18.7µs ± 0% +0.02% (p=0.026 n=10+10) WakeupParallelSpinning/20µs-8 93.0µs ± 0% 93.0µs ± 0% +0.01% (p=0.021 n=9+10) StackGrowth-8 216ns ± 0% 216ns ± 0% ~ (p=0.209 n=10+10) CreateGoroutinesParallel-8 49.5ns ± 2% 49.3ns ± 1% ~ (p=0.591 n=10+10) CreateGoroutinesSingle 699ns ±20% 748ns ±19% ~ (p=0.353 n=10+10) WakeupParallelSpinning/0s-8 15.9µs ± 2% 16.0µs ± 3% ~ (p=0.315 n=10+10) WakeupParallelSpinning/1µs 14.6µs ± 0% 14.6µs ± 0% ~ (p=0.513 n=10+10) WakeupParallelSpinning/2µs-8 24.2µs ± 3% 24.1µs ± 2% ~ (p=0.971 n=10+10) WakeupParallelSpinning/10µs 50.7µs ± 0% 50.7µs ± 0% ~ (p=0.101 n=10+10) WakeupParallelSpinning/20µs 90.7µs ± 0% 90.7µs ± 0% ~ (p=0.898 n=10+10) WakeupParallelSpinning/50µs 211µs ± 0% 211µs ± 0% ~ (p=0.382 n=10+10) WakeupParallelSyscall/0s-8 137µs ± 1% 138µs ± 0% ~ (p=0.075 n=10+10) WakeupParallelSyscall/1µs 216µs ± 1% 219µs ± 3% ~ (p=0.065 n=10+9) WakeupParallelSyscall/2µs 214µs ± 7% 219µs ± 1% ~ (p=0.101 n=10+8) WakeupParallelSyscall/50µs 317µs ± 5% 326µs ± 4% ~ (p=0.123 n=10+10) WakeupParallelSyscall/100µs 450µs ± 9% 459µs ± 8% ~ (p=0.247 n=10+10) WakeupParallelSyscall/100µs-8 337µs ± 0% 338µs ± 1% ~ (p=0.089 n=10+10) WakeupParallelSpinning/5µs-8 32.2µs ± 0% 32.2µs ± 0% -0.05% (p=0.026 n=9+10) WakeupParallelSpinning/50µs-8 216µs ± 0% 216µs ± 0% -0.12% (p=0.004 n=10+10) WakeupParallelSpinning/1µs-8 20.6µs ± 0% 20.5µs ± 0% -0.22% (p=0.014 n=10+10) WakeupParallelSpinning/10µs-8 54.5µs ± 0% 54.2µs ± 1% -0.57% (p=0.000 n=10+10) CreateGoroutines-8 213ns ± 1% 211ns ± 1% -0.86% (p=0.002 n=10+10) CreateGoroutinesCapture 1.03µs ± 0% 1.02µs ± 0% -0.91% (p=0.000 n=10+10) CreateGoroutinesCapture-8 1.32µs ± 1% 1.31µs ± 1% -1.06% (p=0.001 n=10+9) CreateGoroutines 188ns ± 0% 186ns ± 0% -1.06% (p=0.000 n=9+10) CreateGoroutinesParallel 188ns ± 0% 186ns ± 0% -1.27% (p=0.000 n=8+10) WakeupParallelSyscall/0s 210µs ± 3% 207µs ± 3% -1.60% (p=0.043 n=10+10) StackGrowthDeep 121µs ± 1% 119µs ± 1% -1.70% (p=0.000 n=9+10) Matmult-8 1.82ns ± 3% 1.78ns ± 3% -2.16% (p=0.020 n=10+10) WakeupParallelSyscall/20µs 281µs ± 3% 269µs ± 4% -4.44% (p=0.000 n=10+10) WakeupParallelSyscall/10µs 239µs ± 3% 228µs ± 9% -4.70% (p=0.001 n=10+10) GoroutineProfile/sparse-nil/idle-8 485µs ± 2% 12µs ± 4% -97.56% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle-8 484µs ± 2% 12µs ± 1% -97.60% (p=0.000 n=10+7) GoroutineProfile/small-nil/loaded-8 487µs ± 2% 11µs ± 3% -97.68% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 507µs ± 4% 11µs ± 6% -97.78% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 709µs ± 2% 11µs ± 4% -98.38% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 717µs ± 2% 11µs ± 3% -98.43% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 465µs ± 3% 1µs ± 1% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 493µs ± 3% 1µs ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 716µs ± 1% 1µs ± 2% -99.89% (p=0.000 n=7+10) name old alloc/op new alloc/op delta CreateGoroutinesCapture 144B ± 0% 144B ± 0% ~ (all equal) CreateGoroutinesCapture-8 144B ± 0% 144B ± 0% ~ (all equal) name old allocs/op new allocs/op delta CreateGoroutinesCapture 5.00 ± 0% 5.00 ± 0% ~ (all equal) CreateGoroutinesCapture-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) name old p50-ns new p50-ns delta GoroutineProfile/small/loaded-8 1.01M ± 3% 3.87M ±45% +282.15% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.02M ± 3% 2.43M ±41% +138.42% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.43M ±16% 17.28M ± 2% +132.43% (p=0.000 n=10+10) GoroutineProfile/small/idle 956k ± 0% 1559k ±16% +63.03% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01M ± 3% 1.45M ± 7% +44.31% (p=0.000 n=10+9) GoroutineProfile/sparse/idle 977k ± 1% 1399k ± 2% +43.20% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00M ± 3% 1.41M ± 3% +40.47% (p=0.000 n=10+10) GoroutineProfile/large/idle-8 6.97M ± 1% 8.41M ±25% +20.54% (p=0.003 n=8+10) GoroutineProfile/large/idle 6.71M ± 1% 7.46M ± 4% +11.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle-8 483k ± 3% 13k ± 3% -97.41% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 483k ± 2% 12k ± 1% -97.43% (p=0.000 n=10+8) GoroutineProfile/small-nil/loaded-8 484k ± 3% 10k ± 2% -97.93% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/loaded-8 492k ± 2% 10k ± 4% -97.97% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle-8 708k ± 2% 12k ±15% -98.36% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 714k ± 2% 10k ± 2% -98.60% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/idle 459k ± 1% 1k ± 1% -99.85% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 477k ± 1% 1k ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 712k ± 1% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p90-ns new p90-ns delta GoroutineProfile/small/loaded-8 1.13M ±10% 7.49M ±35% +562.07% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.10M ±12% 4.58M ±31% +318.02% (p=0.000 n=10+9) GoroutineProfile/large/loaded-8 8.78M ±24% 27.83M ± 2% +217.00% (p=0.000 n=10+10) GoroutineProfile/small/idle 967k ± 0% 2909k ±50% +200.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.02M ± 3% 1.96M ±76% +92.99% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.07M ±17% 1.55M ±12% +45.23% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 992k ± 1% 1417k ± 3% +42.79% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.73M ± 0% 7.99M ± 8% +18.80% (p=0.000 n=8+10) GoroutineProfile/large/idle-8 8.20M ±25% 9.18M ±25% ~ (p=0.315 n=10+10) GoroutineProfile/sparse-nil/idle-8 495k ± 3% 13k ± 1% -97.36% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 494k ± 2% 13k ± 3% -97.36% (p=0.000 n=10+10) GoroutineProfile/small-nil/loaded-8 496k ± 2% 13k ± 1% -97.41% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 544k ±11% 13k ± 1% -97.62% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle-8 724k ± 1% 13k ± 3% -98.20% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 729k ± 3% 13k ± 2% -98.23% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 476k ± 4% 1k ± 1% -99.85% (p=0.000 n=9+10) GoroutineProfile/small-nil/idle 537k ±10% 1k ± 0% -99.87% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 729k ± 0% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p99-ns new p99-ns delta GoroutineProfile/sparse/loaded-8 1.27M ±33% 20.49M ±17% +1514.61% (p=0.000 n=10+10) GoroutineProfile/small/loaded-8 1.37M ±29% 20.48M ±23% +1399.35% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 9.76M ±23% 39.98M ±22% +309.52% (p=0.000 n=10+8) GoroutineProfile/small/idle 976k ± 1% 3367k ±55% +244.94% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.03M ± 3% 2.50M ±65% +142.30% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.17M ±34% 1.70M ±14% +45.15% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 1.02M ± 3% 1.45M ± 4% +42.64% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.92M ± 2% 9.00M ± 7% +29.98% (p=0.000 n=8+9) GoroutineProfile/large/idle-8 8.74M ±23% 9.90M ±24% ~ (p=0.190 n=10+10) GoroutineProfile/sparse-nil/idle-8 508k ± 4% 16k ± 2% -96.90% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 508k ± 4% 16k ± 3% -96.91% (p=0.000 n=10+9) GoroutineProfile/small-nil/loaded-8 542k ± 5% 15k ±15% -97.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 649k ±16% 15k ±18% -97.67% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 738k ± 2% 16k ± 2% -97.86% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 765k ± 4% 15k ±17% -98.03% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 539k ±26% 1k ±17% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 659k ±25% 1k ± 0% -99.84% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle 760k ± 2% 1k ±22% -99.88% (p=0.000 n=9+10) Fixes #33250 For #50794 Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b Reviewed-on: https://go-review.googlesource.com/c/go/+/387415 Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Run-TryBot: Rhys Hiltner <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
1 parent b9dee7e commit 52bd1c4

File tree

5 files changed

+284
-3
lines changed

5 files changed

+284
-3
lines changed

src/runtime/mfinal.go

+5-2
Original file line numberDiff line numberDiff line change
@@ -166,13 +166,16 @@ func runfinq() {
166166
argRegs int
167167
)
168168

169+
gp := getg()
170+
lock(&finlock)
171+
fing = gp
172+
unlock(&finlock)
173+
169174
for {
170175
lock(&finlock)
171176
fb := finq
172177
finq = nil
173178
if fb == nil {
174-
gp := getg()
175-
fing = gp
176179
fingwait = true
177180
goparkunlock(&finlock, waitReasonFinalizerWait, traceEvGoBlock, 1)
178181
continue

src/runtime/mprof.go

+249
Original file line numberDiff line numberDiff line change
@@ -753,11 +753,260 @@ func runtime_goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer
753753
return goroutineProfileWithLabels(p, labels)
754754
}
755755

756+
const go119ConcurrentGoroutineProfile = true
757+
756758
// labels may be nil. If labels is non-nil, it must have the same length as p.
757759
func goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
758760
if labels != nil && len(labels) != len(p) {
759761
labels = nil
760762
}
763+
764+
if go119ConcurrentGoroutineProfile {
765+
return goroutineProfileWithLabelsConcurrent(p, labels)
766+
}
767+
return goroutineProfileWithLabelsSync(p, labels)
768+
}
769+
770+
var goroutineProfile = struct {
771+
sema uint32
772+
active bool
773+
offset atomic.Int64
774+
records []StackRecord
775+
labels []unsafe.Pointer
776+
}{
777+
sema: 1,
778+
}
779+
780+
// goroutineProfileState indicates the status of a goroutine's stack for the
781+
// current in-progress goroutine profile. Goroutines' stacks are initially
782+
// "Absent" from the profile, and end up "Satisfied" by the time the profile is
783+
// complete. While a goroutine's stack is being captured, its
784+
// goroutineProfileState will be "InProgress" and it will not be able to run
785+
// until the capture completes and the state moves to "Satisfied".
786+
//
787+
// Some goroutines (the finalizer goroutine, which at various times can be
788+
// either a "system" or a "user" goroutine, and the goroutine that is
789+
// coordinating the profile, any goroutines created during the profile) move
790+
// directly to the "Satisfied" state.
791+
type goroutineProfileState uint32
792+
793+
const (
794+
goroutineProfileAbsent goroutineProfileState = iota
795+
goroutineProfileInProgress
796+
goroutineProfileSatisfied
797+
)
798+
799+
type goroutineProfileStateHolder atomic.Uint32
800+
801+
func (p *goroutineProfileStateHolder) Load() goroutineProfileState {
802+
return goroutineProfileState((*atomic.Uint32)(p).Load())
803+
}
804+
805+
func (p *goroutineProfileStateHolder) Store(value goroutineProfileState) {
806+
(*atomic.Uint32)(p).Store(uint32(value))
807+
}
808+
809+
func (p *goroutineProfileStateHolder) CompareAndSwap(old, new goroutineProfileState) bool {
810+
return (*atomic.Uint32)(p).CompareAndSwap(uint32(old), uint32(new))
811+
}
812+
813+
func goroutineProfileWithLabelsConcurrent(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
814+
semacquire(&goroutineProfile.sema)
815+
816+
ourg := getg()
817+
818+
stopTheWorld("profile")
819+
// Using gcount while the world is stopped should give us a consistent view
820+
// of the number of live goroutines, minus the number of goroutines that are
821+
// alive and permanently marked as "system". But to make this count agree
822+
// with what we'd get from isSystemGoroutine, we need special handling for
823+
// goroutines that can vary between user and system to ensure that the count
824+
// doesn't change during the collection. So, check the finalizer goroutine
825+
// in particular.
826+
n = int(gcount())
827+
if fingRunning {
828+
n++
829+
}
830+
831+
if n > len(p) {
832+
// There's not enough space in p to store the whole profile, so (per the
833+
// contract of runtime.GoroutineProfile) we're not allowed to write to p
834+
// at all and must return n, false.
835+
startTheWorld()
836+
semrelease(&goroutineProfile.sema)
837+
return n, false
838+
}
839+
840+
// Save current goroutine.
841+
sp := getcallersp()
842+
pc := getcallerpc()
843+
systemstack(func() {
844+
saveg(pc, sp, ourg, &p[0])
845+
})
846+
ourg.goroutineProfiled.Store(goroutineProfileSatisfied)
847+
goroutineProfile.offset.Store(1)
848+
849+
// Prepare for all other goroutines to enter the profile. Aside from ourg,
850+
// every goroutine struct in the allgs list has its goroutineProfiled field
851+
// cleared. Any goroutine created from this point on (while
852+
// goroutineProfile.active is set) will start with its goroutineProfiled
853+
// field set to goroutineProfileSatisfied.
854+
goroutineProfile.active = true
855+
goroutineProfile.records = p
856+
goroutineProfile.labels = labels
857+
// The finializer goroutine needs special handling because it can vary over
858+
// time between being a user goroutine (eligible for this profile) and a
859+
// system goroutine (to be excluded). Pick one before restarting the world.
860+
if fing != nil {
861+
fing.goroutineProfiled.Store(goroutineProfileSatisfied)
862+
}
863+
if readgstatus(fing) != _Gdead && !isSystemGoroutine(fing, false) {
864+
doRecordGoroutineProfile(fing)
865+
}
866+
startTheWorld()
867+
868+
// Visit each goroutine that existed as of the startTheWorld call above.
869+
//
870+
// New goroutines may not be in this list, but we didn't want to know about
871+
// them anyway. If they do appear in this list (via reusing a dead goroutine
872+
// struct, or racing to launch between the world restarting and us getting
873+
// the list), they will aleady have their goroutineProfiled field set to
874+
// goroutineProfileSatisfied before their state transitions out of _Gdead.
875+
//
876+
// Any goroutine that the scheduler tries to execute concurrently with this
877+
// call will start by adding itself to the profile (before the act of
878+
// executing can cause any changes in its stack).
879+
forEachGRace(func(gp1 *g) {
880+
tryRecordGoroutineProfile(gp1, Gosched)
881+
})
882+
883+
stopTheWorld("profile cleanup")
884+
endOffset := goroutineProfile.offset.Swap(0)
885+
goroutineProfile.active = false
886+
goroutineProfile.records = nil
887+
goroutineProfile.labels = nil
888+
startTheWorld()
889+
890+
// Restore the invariant that every goroutine struct in allgs has its
891+
// goroutineProfiled field cleared.
892+
forEachGRace(func(gp1 *g) {
893+
gp1.goroutineProfiled.Store(goroutineProfileAbsent)
894+
})
895+
896+
if raceenabled {
897+
raceacquire(unsafe.Pointer(&labelSync))
898+
}
899+
900+
if n != int(endOffset) {
901+
// It's a big surprise that the number of goroutines changed while we
902+
// were collecting the profile. But probably better to return a
903+
// truncated profile than to crash the whole process.
904+
//
905+
// For instance, needm moves a goroutine out of the _Gdead state and so
906+
// might be able to change the goroutine count without interacting with
907+
// the scheduler. For code like that, the race windows are small and the
908+
// combination of features is uncommon, so it's hard to be (and remain)
909+
// sure we've caught them all.
910+
}
911+
912+
semrelease(&goroutineProfile.sema)
913+
return n, true
914+
}
915+
916+
// tryRecordGoroutineProfileWB asserts that write barriers are allowed and calls
917+
// tryRecordGoroutineProfile.
918+
//
919+
//go:yeswritebarrierrec
920+
func tryRecordGoroutineProfileWB(gp1 *g) {
921+
if getg().m.p.ptr() == nil {
922+
throw("no P available, write barriers are forbidden")
923+
}
924+
tryRecordGoroutineProfile(gp1, osyield)
925+
}
926+
927+
// tryRecordGoroutineProfile ensures that gp1 has the appropriate representation
928+
// in the current goroutine profile: either that it should not be profiled, or
929+
// that a snapshot of its call stack and labels are now in the profile.
930+
func tryRecordGoroutineProfile(gp1 *g, yield func()) {
931+
if readgstatus(gp1) == _Gdead {
932+
// Dead goroutines should not appear in the profile. Goroutines that
933+
// start while profile collection is active will get goroutineProfiled
934+
// set to goroutineProfileSatisfied before transitioning out of _Gdead,
935+
// so here we check _Gdead first.
936+
return
937+
}
938+
if isSystemGoroutine(gp1, true) {
939+
// System goroutines should not appear in the profile. (The finalizer
940+
// goroutine is marked as "already profiled".)
941+
return
942+
}
943+
944+
for {
945+
prev := gp1.goroutineProfiled.Load()
946+
if prev == goroutineProfileSatisfied {
947+
// This goroutine is already in the profile (or is new since the
948+
// start of collection, so shouldn't appear in the profile).
949+
break
950+
}
951+
if prev == goroutineProfileInProgress {
952+
// Something else is adding gp1 to the goroutine profile right now.
953+
// Give that a moment to finish.
954+
yield()
955+
continue
956+
}
957+
958+
// While we have gp1.goroutineProfiled set to
959+
// goroutineProfileInProgress, gp1 may appear _Grunnable but will not
960+
// actually be able to run. Disable preemption for ourselves, to make
961+
// sure we finish profiling gp1 right away instead of leaving it stuck
962+
// in this limbo.
963+
mp := acquirem()
964+
if gp1.goroutineProfiled.CompareAndSwap(goroutineProfileAbsent, goroutineProfileInProgress) {
965+
doRecordGoroutineProfile(gp1)
966+
gp1.goroutineProfiled.Store(goroutineProfileSatisfied)
967+
}
968+
releasem(mp)
969+
}
970+
}
971+
972+
// doRecordGoroutineProfile writes gp1's call stack and labels to an in-progress
973+
// goroutine profile. Preemption is disabled.
974+
//
975+
// This may be called via tryRecordGoroutineProfile in two ways: by the
976+
// goroutine that is coordinating the goroutine profile (running on its own
977+
// stack), or from the scheduler in preparation to execute gp1 (running on the
978+
// system stack).
979+
func doRecordGoroutineProfile(gp1 *g) {
980+
if readgstatus(gp1) == _Grunning {
981+
print("doRecordGoroutineProfile gp1=", gp1.goid, "\n")
982+
throw("cannot read stack of running goroutine")
983+
}
984+
985+
offset := int(goroutineProfile.offset.Add(1)) - 1
986+
987+
if offset >= len(goroutineProfile.records) {
988+
// Should be impossible, but better to return a truncated profile than
989+
// to crash the entire process at this point. Instead, deal with it in
990+
// goroutineProfileWithLabelsConcurrent where we have more context.
991+
return
992+
}
993+
994+
// saveg calls gentraceback, which may call cgo traceback functions. When
995+
// called from the scheduler, this is on the system stack already so
996+
// traceback.go:cgoContextPCs will avoid calling back into the scheduler.
997+
//
998+
// When called from the goroutine coordinating the profile, we still have
999+
// set gp1.goroutineProfiled to goroutineProfileInProgress and so are still
1000+
// preventing it from being truly _Grunnable. So we'll use the system stack
1001+
// to avoid schedule delays.
1002+
systemstack(func() { saveg(^uintptr(0), ^uintptr(0), gp1, &goroutineProfile.records[offset]) })
1003+
1004+
if goroutineProfile.labels != nil {
1005+
goroutineProfile.labels[offset] = gp1.labels
1006+
}
1007+
}
1008+
1009+
func goroutineProfileWithLabelsSync(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
7611010
gp := getg()
7621011

7631012
isOK := func(gp1 *g) bool {

src/runtime/proc.go

+25
Original file line numberDiff line numberDiff line change
@@ -2508,6 +2508,13 @@ func gcstopm() {
25082508
func execute(gp *g, inheritTime bool) {
25092509
_g_ := getg()
25102510

2511+
if goroutineProfile.active {
2512+
// Make sure that gp has had its stack written out to the goroutine
2513+
// profile, exactly as it was when the goroutine profiler first stopped
2514+
// the world.
2515+
tryRecordGoroutineProfile(gp, osyield)
2516+
}
2517+
25112518
// Assign gp.m before entering _Grunning so running Gs have an
25122519
// M.
25132520
_g_.m.curg = gp
@@ -3767,6 +3774,16 @@ func exitsyscall() {
37673774
oldp := _g_.m.oldp.ptr()
37683775
_g_.m.oldp = 0
37693776
if exitsyscallfast(oldp) {
3777+
// When exitsyscallfast returns success, we have a P so can now use
3778+
// write barriers
3779+
if goroutineProfile.active {
3780+
// Make sure that gp has had its stack written out to the goroutine
3781+
// profile, exactly as it was when the goroutine profiler first
3782+
// stopped the world.
3783+
systemstack(func() {
3784+
tryRecordGoroutineProfileWB(_g_)
3785+
})
3786+
}
37703787
if trace.enabled {
37713788
if oldp != _g_.m.p.ptr() || _g_.m.syscalltick != _g_.m.p.ptr().syscalltick {
37723789
systemstack(traceGoStart)
@@ -4134,6 +4151,14 @@ func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
41344151
if _g_.m.curg != nil {
41354152
newg.labels = _g_.m.curg.labels
41364153
}
4154+
if goroutineProfile.active {
4155+
// A concurrent goroutine profile is running. It should include
4156+
// exactly the set of goroutines that were alive when the goroutine
4157+
// profiler first stopped the world. That does not include newg, so
4158+
// mark it as not needing a profile before transitioning it from
4159+
// _Gdead.
4160+
newg.goroutineProfiled.Store(goroutineProfileSatisfied)
4161+
}
41374162
}
41384163
// Track initial transition?
41394164
newg.trackingSeq = uint8(fastrand())

src/runtime/runtime2.go

+4
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,10 @@ type g struct {
489489
timer *timer // cached timer for time.Sleep
490490
selectDone uint32 // are we participating in a select and did someone win the race?
491491

492+
// goroutineProfiled indicates the status of this goroutine's stack for the
493+
// current in-progress goroutine profile
494+
goroutineProfiled goroutineProfileStateHolder
495+
492496
// Per-G GC state
493497

494498
// gcAssistBytes is this G's GC assist credit in terms of

src/runtime/sizeof_test.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ func TestSizeof(t *testing.T) {
2121
_32bit uintptr // size on 32bit platforms
2222
_64bit uintptr // size on 64bit platforms
2323
}{
24-
{runtime.G{}, 236, 392}, // g, but exported for testing
24+
{runtime.G{}, 240, 392}, // g, but exported for testing
2525
{runtime.Sudog{}, 56, 88}, // sudog, but exported for testing
2626
}
2727

0 commit comments

Comments
 (0)