-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: performance regression in ec9c84c8 #17250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. This commit wasn't expected to change performance at all. I'm able to reproduce the slowdown on my workstation using the TimeFormat benchmark from $GOROOT/test/bench/go1. My simplest hypothesis was that I'd simply screwed up the change and thrown off the scheduling of GC, but that appears not to be the case. The number of GCs is unchanged and all of the heap-based scheduling is identical to the megabyte. The wall-clock times for GC are also nearly identical (easily within the noise). Furthermore, there's only one GC during the benchmark itself and its times are the same to within a microsecond. Before:
After:
This all suggests a subtler effect, and most likely on malloc rather than GC. |
@aclements thanks for the quick reply. Let me know if you need me to try out a patch. |
It turns out this is just an unfortunate code alignment issue, at least for TimeFormat. It's not even a data alignment issue. By (unintentionally) taking advantage of an odd property of incremental builds versus a full make.bash build, I was able to build two go1 benchmark binaries at ec9c84c where one of them was significantly slower than the build at 196df6f, while the other performed identically. The only difference between the two binaries is the size and alignment of some of the text symbols. runtime.gcController is at the exact same address in both binaries, so this only affected the text. I also built a version at ec9c84c with a patch (CL 30013) to move the new These sorts of alignment shifts happen all the time as the runtime, compiler, and common libraries change. Usually some of the shifts are helpful and others are detrimental, so they wind up cancelling out, but sometimes, by chance, you wind up with more detrimental shifts in a single change. That seems to be what's going on here. The bad news is it's not clear what we can do about it, but the good news is they cancel out in the noise over the long run. Because of this, I'm going to close this bug as resolved. However, @mvdan, if you find that this is a stable change over the longer term with other compiler and runtime changes, please ping this bug. The process to build the two binaries was:
The following symbols differed in size (delta from first/slower to second/faster binary):
I don't believe any of these are directly involved in TimeFormat, so the effect must have been the perturbed alignment of other symbols between these. (I looked into why (*parser).collapse changed and, curiously, it's because the stack spill slots are different. It has a just large enough frame that changes a few stack-relative MOVs between the short five byte encoding and the longer eight byte encoding.) |
Oh, interesting.
Mine was I didn't know alignment issues like these could have such a big change in performance, but what you explain makes sense. Thanks for the thorough explanation.
Just to be sure I understand; This means that if I do my process enough times, I should get the 3/4% slowdown sometimes, and no slowdown at all some other times? I'm not sure I understand what you mean by incremental builds, but I think I'm not doing that precisely because of the |
You're doing a full build because of the |
That makes sense. What I don't understand is how you managed to get two test binaries, one that showed the slowdown and one that didn't - presumably because the alignment was different in each. Is there a doc that outlines the workflow when working with tip? I have no idea if full builds are even necessary after pulling a bunch of commits on tip, versus something like |
This was entirely accidental. In principle, the incremental build should produce the same binary as the full build given the same source tree. I don't know why it doesn't, and this is probably a (minor) bug in the compiler, but I just took advantage of it when I noticed. :)
You should always do a full |
Right, thanks for the explanation! |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
3311275c
)196df6f0
)benchstat
.What did you expect to see?
What did you see instead?
I focused on
Parse/Quoted
and tried to search what commit was producing such a noticeable slowdown. I narrowed it down to ec9c84c:This test mainly takes a slice of bytes and iterates through it, with some simple logic and garbage generation. I'm not sure what metrics would be useful here, or how to narrow down the problem to provide a small play.golang.org benchmark that wasn't part of a big package.
Here are the pprof results of both before and after the tip commit. Before:
After:
I can upload the test binaries if that would help.
The text was updated successfully, but these errors were encountered: