testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs #73137

thepudds · 2025-04-02T15:04:27Z

Go version

go version go1.24.2

What did you do?

I read the current b.Loop API doc and draft b.Loop blog. I also used gopls.

What did you see happen?

Currently, the API docs and the draft b.Loop blog seem to recommend b.Loop as a better approach than the older approach, but seem to do so without mentioning some of the potential drawbacks. gopls seems to also recommend switching over all benchmarks as part of its modernizer work.

I'm also seeing people on places like BlueSky and elsewhere who seem to recommend universally converting all benchmarks over to b.Loop, with other people then agreeing with that recommendation without discussion of pros/cons.

What did you expect to see?

As I understand it, the current implementation stops the inlining of any functions called within the b.Loop for loop, which I think means arguments to those functions need to be treated as function parameters within the non-inlined function (and hence can't be constant folded away, etc.), and similarly the return values within the non-inlined function can't be entirely optimized away (so whatever the function does to generate the return values also can't be optimized away).

However, it seems plausible that it might be desirable to change the exact implementation in the future. For example, if #61179 is accepted, it might be desirable to use testing.Keep in the implementation of b.Loop, or even if not directly used in the implementation, there's at least some chance it might make sense to describe b.Loop in terms of testing.Keep semantics.

Or maybe that's not what ends up happening, but it could be worthwhile to keep that door open.

The current API doc says:

The compiler never optimizes away calls to functions within the body of a "for b.Loop() { ... }" loop. This prevents surprises that can otherwise occur if the compiler determines that the result of a benchmarked function is unused.

I wonder if it might make sense to insert the word "currently" or hint that it might change in the future, or maybe imply that the compiler won't optimize away the work of the function, or otherwise tweak the language slightly. The 1.24 release notes use different language that seems better to me:

Function call parameters and results are kept alive, preventing the compiler from fully optimizing away the loop body.

That said, maybe the current API text is ambiguous enough, including it's not 100% clear to me if that text is intending to specifically describe inlining. (For example, if I was talking to someone and wanted to point to a function that I think would be inlined, and I don't think I would say "the compiler will optimize away calls to that function").

Separately, while I think b.Loop is an excellent step forward, I am hoping the implementation does change a bit. For me personally, I have not yet embraced b.Loop for new or existing benchmarks because of its current implementation.

For example, in many cases, I'm relying on inlining to happen so that something can be stack allocated.

To pick a trivial but concrete case, suppose you have something like:

// Sample function that relies on mid-stack inlining so that the slice is not always heap allocated.
// (See for example https://words.filippo.io/efficient-go-apis-with-the-inliner).
func NewX(x int) []byte {
	out := make([]byte, 8)
	return newX(out, x)
}

//go:noinline
func newX(out []byte, x int) []byte {
	binary.LittleEndian.PutUint64(out, uint64(x))
	return out
}

Using the old b.N approach, the benchmark avoids the heap allocation, just as normal usage in non-test code would avoid the heap allocation:

	b.Run("b.N", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			// 0 heap allocs in benchmark.
			// Example of old sink-based approach.
			out := NewX(42)
			sink = out[0]
		}
	})

In contrast, just directly using b.Loop causes a heap allocation, which does not reflect what would happen in normal non-test usage:

	b.Run("b.Loop-basic", func(b *testing.B) {
		for b.Loop() {
			// 1 heap alloc in benchmark because b.Loop purposefully prevents NewX from being inlined.
			// This seems to be recommended approach according to current docs and draft blog.
			NewX(42)
		}
	})

I can for example move the function-under-test to a closure:

	b.Run("b.Loop-closure", func(b *testing.B) {
		bench := func(x int) {
			// 0 heap allocs in benchmark.
			// Closure is not inlined in b.Loop, but no use of return value,
			// so in theory could get "overly" optimized.
			NewX(x)
		}
		for b.Loop() {
			bench(42)
		}
	})

But if I just do that, I might open myself up to some of the problems testing.Keep and b.Loop are intended to avoid, so I can use an older sink style approach with the closure:

	b.Run("b.Loop-closure-with-sink", func(b *testing.B) {
		bench := func(x int) {
			// 0 heap allocs in benchmark.
			// Closure with old sink-based approach for the result.
			out := NewX(x)
			sink = out[0]
		}
		for b.Loop() {
			bench(42)
		}
	})

Or use runtime.KeepAlive as a stand-in for testing.Keep:

	b.Run("b.Loop-closure-with-keepalive", func(b *testing.B) {
		bench := func(x int) {
			// 0 heap allocs in benchmark.
			// KeepAlive requires the result of NewX to be computed, I think, even if NewX is inlined.
			runtime.KeepAlive(NewX(x))
		}
		for b.Loop() {
			bench(42)
		}
	})

All of which is to say:

It might be worthwhile to mention somewhere that the older b.N can still be useful in some cases, or alternatively maybe mention some of the drawbacks and workarounds for the current b.Loop implementation.
It would be nice for the API docs to at least keep the door open to future adjustments, or roughly equivalently, for someone on the core Go team to say publicly that the current language does not really tie our hands for future possible improvements. (Or maybe that statement has effectively already been made).

As an example alternate implementation, I wonder if the compiler could take what I wrote as:

		for b.Loop() {
			NewX(42)
		}

and automatically transform it to the "b.Loop-closure-with-keepalive" example just above. In that case, the bench closure would not be inlined inside the b.Loop block, but NewX could be inlined inside bench.

(I haven't thought through the implications entirely, but perhaps that would retain roughly all the good aspects of the current approach, including not being an overly complex implementation within cmd/compile, but without forcing extra heap allocations. There would still be a function call overhead introduced by b.Loop compared to the older b.N approach, but that is less dramatic impact and also would happen more uniformly, whereas currently b.Loop can introduce a lot of overhead or a little overhead depending on the functions under test.)

Runnable playground link for above examples: https://go.dev/play/p/1bZGIS9Tcxw

CC @JunyangShao, @aclements, @zeebo

The text was updated successfully, but these errors were encountered:

thepudds · 2025-04-02T15:11:19Z

Also, the problems of heap allocations and the possible workarounds of the user creating manually a closure was discussed a bit in #61179 and #61515, though I think those parts of the discussions might have been prior to the definition & selection of the final semantics of b.Loop for Go 1.24.

The main point in raising this issue is that I don't see anything about the drawbacks or alternatives in the official docs or blog post. (That said, maybe I missed it or just misunderstood, or a reasonable conclusion might be that this is too low level of a concern).

gabyhelp · 2025-04-02T15:17:26Z

Related Issues

Related Code Changes

Related Documentation

Package testing > type B ¶ > func (*B) Loop ¶ 1.24

Related Discussions

Go 1.24 BLooper

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

aclements · 2025-04-02T16:16:04Z

Thanks for the detailed report. I hadn't considered that implementing the "Function call parameters and results are kept alive" requirement as disabling inlining would have this particular effect (though now that you point it out, I should have!). I agree that we don't want to pin ourselves down to "nothing gets inlined in a b.Loop loop".

IMO, we should take two actions here:

Change the doc comment of b.Loop to say what the release notes say: "Function call parameters and results are kept alive, preventing the compiler from fully optimizing away the loop body." We can mention that "Currently, this is implemented by disabling inlining of functions called in a b.Loop loop."
We should fix the compiler implementation so it doesn't have this drawback. I think this would involve an AST walk to mark function arguments and results prior to inlining, allowing inlining as usual, and then ensuring that marked arguments and results are not dead-coded.

The intent is that b.Loop should be strictly better than b.N, so if there are drawbacks we should just fix them. :)

cc @prattmic

thepudds · 2025-04-02T16:46:43Z

The intent is that b.Loop should be strictly better than b.N, so if there are drawbacks we should just fix them. :)

An excellent philosophy. 👍 🚀

gopherbot · 2025-04-02T21:25:56Z

Change https://go.dev/cl/662356 mentions this issue: testing: clarify how B.Loop avoids optimizing away all the useful work

AlekSi · 2025-04-03T06:50:12Z

Should this work also cover testing.PB.Next?

As discussed in #73137, we want to clarify the description of how B.Loop avoids surprising optimizations, while also hinting that the exact approach might change in the future. Updates #73137 Change-Id: I8536540cd5d79804a47fba8cd6eec3821864309d Reviewed-on: https://go-review.googlesource.com/c/go/+/662356 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Carlos Amedee <[email protected]> Auto-Submit: Alan Donovan <[email protected]> Reviewed-by: Alan Donovan <[email protected]>

thepudds · 2025-04-18T15:49:08Z

FWIW, #73417 is probably a case of "no inlining" affecting benchmark results in a way people found surprising.

In that issue, someone reported a bug against generics support in the compiler, citing ~40 or so b.Loop benchmarks that had a geomean +41.19% change based on introducing generics to some code.

A later comment there then shows the b.N form with a much smaller geomean of +0.87% (which might just be noise), and the bug was closed shortly after that.

Inlining most likely played a part, including because the change being benchmarked seemed to include introducing some ~1 line wrapper functions like:

func DecodeRuneInString(s string) (r rune, size int) {
        return genDecodeRune(s)
}

(I didn't run the benchmarks myself or poke too deeply, but seems plausible it is related to this b.Loop issue here).

gabyhelp added the ToolProposal Issues describing a requested change to a Go tool or command-line program. label Apr 2, 2025

thepudds changed the title ~~testing: consider loosening description of how b.Loop works and maybe mention drawbacks or alternatives~~ testing: loosen description of how b.Loop works today and ideally avoid forcing heap allocs in future Apr 2, 2025

cagedmantis added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Apr 3, 2025

mpx mentioned this issue Apr 4, 2025

testing: b.Loop prevents range func inlining which drastically changes performance result #73166

Closed

aclements changed the title ~~testing: loosen description of how b.Loop works today and ideally avoid forcing heap allocs in future~~ testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs Apr 4, 2025

aclements marked this as a duplicate of #73166 Apr 4, 2025

thepudds mentioned this issue Apr 17, 2025

cmd/compile: a string | []byte generic constraint isn't a zero-cost abstraction #73417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs #73137

testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs #73137

thepudds commented Apr 2, 2025 •

edited

Loading

thepudds commented Apr 2, 2025

gabyhelp commented Apr 2, 2025

aclements commented Apr 2, 2025

thepudds commented Apr 2, 2025

gopherbot commented Apr 2, 2025

AlekSi commented Apr 3, 2025

thepudds commented Apr 18, 2025

testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs #73137

testing: implement b.Loop keepalive without preventing inlining, which can lead to heap allocs #73137

Comments

thepudds commented Apr 2, 2025 • edited Loading

Go version

What did you do?

What did you see happen?

What did you expect to see?

thepudds commented Apr 2, 2025

gabyhelp commented Apr 2, 2025

aclements commented Apr 2, 2025

thepudds commented Apr 2, 2025

gopherbot commented Apr 2, 2025

AlekSi commented Apr 3, 2025

thepudds commented Apr 18, 2025

thepudds commented Apr 2, 2025 •

edited

Loading