-
Notifications
You must be signed in to change notification settings - Fork 3.9k
microbench-ci: per metric retry #143915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
microbench-ci: per metric retry #143915
Conversation
c803a5a
to
ac3de3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/cmd/microbench-ci/main.go
Outdated
artifactsDir := s.artifactsDir(revision) | ||
for _, status := range []Status{Regressed, Improved} { | ||
marker := strings.ToUpper(status.String()) | ||
markerFile := path.Join(artifactsDir, benchmark.sanitizedName()+"."+marker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since you build the same file path in two different locations, you could move the code to an helper func; it would ease things if the file path is changed in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll amend before merging.
c173780
to
0835342
Compare
pkg/cmd/microbench-ci/compare.go
Outdated
|
||
// If the benchmark has a marker file (regressed or improved), compare | ||
// all the runs instead of just the last one. | ||
if suite.hasMarkerFile(New, &benchmark) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So some metric has (say) shown a regression in three consecutive runs of (say) M iterations. Here, we report the analysis on the union of the 3M runs. I assume this will also be known to show a regression? In theory, the first run could've shown a regression but all the baselines may have shifted down, and in the third run they could have shifted up, relative to the middle run. The union of those may not show a regression. I know this is hypothetical, but I still wonder if it'd be cleaner to only return the last result here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the thought crossed my mind. I think you're right, I'll update this to show the last run to avoid something confusing happening.
@@ -1,4 +1,4 @@ | |||
# Check if summary is generated correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing here actually tests the logic that requires multiple iterations each showing a regression, right? Would be worth adding that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the description, but yes this test requires 3 runs to show a regression. The input data is 3 sets of 10 iterations. The regressed marker file is only created if all 3 regressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, may be worthwhile to add the negative case of a partial regression 1 or 2 / 3 showing a regression that shouldn't result in a "final" regression.
Previously, the retry confirmation logic took all metrics of the current benchmark into account. This is problematic because we could possibly trigger a regression if 3 different metrics regressed on 3 different retries for the same benchmark. This logic has been updated to ensure the same metric has to regress on all retries for the benchmark to be considered a regression. Epic: None Release note: None
23f5c39
to
d2d2510
Compare
Previously, if a regression occurred, the comparison step would compare all the runs instead of just the last one. The assumption was that if all runs show a regression we expect the combined result to show a regression, but since the baseline case shift with each run there's a small chance this does not always hold true. Hence, we revert here to only comparing the last run. Epic: None Release note: None
This commit adds a utility function to generate the marker file name for a given benchmark and status. The marker file is used to indicate that a benchmark has changed. It is created for each metric of a benchmark. Epic: None Release note: None
d2d2510
to
daa74ec
Compare
TFTRs! bors r=tbg,DarrylWong,golgeek |
Previously,
This PR addresses both issues by:
Epic: None
Release note: None