Skip to content

microbench-ci: per metric retry #143915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

herkolategan
Copy link
Collaborator

@herkolategan herkolategan commented Apr 4, 2025

Previously,

  • The retry confirmation logic considered all metrics of a benchmark together, which could lead to false positives when different metrics regressed on different retries.
  • The comparison step recalculated regression status instead of using the already-available marker files.

This PR addresses both issues by:

  • Making retry logic track each metric independently, ensuring the same metric must regress consistently across retries
  • Simplifying the comparison step to use marker files instead of recalculating the regression status.

Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@herkolategan herkolategan marked this pull request as ready for review April 4, 2025 15:03
@herkolategan herkolategan requested a review from golgeek April 7, 2025 12:06
@herkolategan herkolategan assigned DarrylWong and unassigned DarrylWong Apr 7, 2025
@herkolategan herkolategan requested review from DarrylWong and tbg April 7, 2025 15:01
@herkolategan herkolategan force-pushed the hbl/microbench-ci-per-metric-check branch from c803a5a to ac3de3d Compare April 8, 2025 13:49
Copy link
Contributor

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

artifactsDir := s.artifactsDir(revision)
for _, status := range []Status{Regressed, Improved} {
marker := strings.ToUpper(status.String())
markerFile := path.Join(artifactsDir, benchmark.sanitizedName()+"."+marker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since you build the same file path in two different locations, you could move the code to an helper func; it would ease things if the file path is changed in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll amend before merging.

@herkolategan herkolategan force-pushed the hbl/microbench-ci-per-metric-check branch from c173780 to 0835342 Compare April 10, 2025 08:16

// If the benchmark has a marker file (regressed or improved), compare
// all the runs instead of just the last one.
if suite.hasMarkerFile(New, &benchmark) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So some metric has (say) shown a regression in three consecutive runs of (say) M iterations. Here, we report the analysis on the union of the 3M runs. I assume this will also be known to show a regression? In theory, the first run could've shown a regression but all the baselines may have shifted down, and in the third run they could have shifted up, relative to the middle run. The union of those may not show a regression. I know this is hypothetical, but I still wonder if it'd be cleaner to only return the last result here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the thought crossed my mind. I think you're right, I'll update this to show the last run to avoid something confusing happening.

@@ -1,4 +1,4 @@
# Check if summary is generated correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing here actually tests the logic that requires multiple iterations each showing a regression, right? Would be worth adding that?

Copy link
Collaborator Author

@herkolategan herkolategan Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the description, but yes this test requires 3 runs to show a regression. The input data is 3 sets of 10 iterations. The regressed marker file is only created if all 3 regressed.

Copy link
Collaborator Author

@herkolategan herkolategan Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, may be worthwhile to add the negative case of a partial regression 1 or 2 / 3 showing a regression that shouldn't result in a "final" regression.

Previously, the retry confirmation logic took all metrics of the current
benchmark into account. This is problematic because we could possibly trigger a
regression if 3 different metrics regressed on 3 different retries for the same
benchmark. This logic has been updated to ensure the same metric has to regress
on all retries for the benchmark to be considered a regression.

Epic: None
Release note: None
@herkolategan herkolategan force-pushed the hbl/microbench-ci-per-metric-check branch 3 times, most recently from 23f5c39 to d2d2510 Compare April 14, 2025 08:12
Previously, if a regression occurred, the comparison step would compare all
the runs instead of just the last one. The assumption was that if all runs show
a regression we expect the combined result to show a regression, but since the
baseline case shift with each run there's a small chance this does not always
hold true. Hence, we revert here to only comparing the last run.

Epic: None
Release note: None
This commit adds a utility function to generate the marker file name for a
given benchmark and status.

The marker file is used to indicate that a benchmark has changed. It is
created for each metric of a benchmark.

Epic: None
Release note: None
@herkolategan herkolategan force-pushed the hbl/microbench-ci-per-metric-check branch from d2d2510 to daa74ec Compare April 14, 2025 08:48
@herkolategan
Copy link
Collaborator Author

TFTRs!

bors r=tbg,DarrylWong,golgeek

@craig
Copy link
Contributor

craig bot commented Apr 14, 2025

@craig craig bot merged commit 345c50c into cockroachdb:master Apr 14, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants