Skip to content

[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 200 commits into from
Feb 21, 2025

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Jul 4, 2024

This PR:

  • adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks a given SYCL branch/build using compute-benchmarks
    • stores benchmark results in a git repo, and
    • aggregates benchmark results in order to produce a median, which is used to pass or fail the benchmark workflow

The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so:

  1. An "aggregate" workflow is ran, which aggregates historic benchmark results in the aforementioned git repo, and produces a historical median
    • This calls upon aggregate.py to handle the actual compute heavy-lifting
  2. The core benchmarking workflow is ran:
    • This calls upon benchmark.sh, which handles the logic for building and running compute-benchmarks
    • Then, compare.py is called upon for the actual comparing of benchmark data against the historical median generated prior

The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf.

Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added.

@ianayl ianayl requested a review from a team as a code owner July 4, 2024 15:54
@ianayl ianayl temporarily deployed to WindowsCILock July 4, 2024 15:54 — with GitHub Actions Inactive
@ianayl ianayl marked this pull request as draft July 4, 2024 15:54
@ianayl ianayl temporarily deployed to WindowsCILock July 4, 2024 15:54 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 4, 2024 19:08 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 4, 2024 19:35 — with GitHub Actions Inactive
@ianayl ianayl changed the title [CI][SYCL] Add PoC CI workflow to run sycl-bench micro-benchmarking suite [CI][SYCL][Do not merge] Add CI workflow to run compute-benchmarks on incoming syclos PRs Sep 6, 2024
@ianayl
Copy link
Contributor Author

ianayl commented Feb 21, 2025

@ianayl
Copy link
Contributor Author

ianayl commented Feb 21, 2025

@intel/llvm-gatekeepers PR is ready for merge, thanks!

Test failure on BMG is a known issue: #17075 (comment)
Changes are unrelated

@aelovikov-intel aelovikov-intel merged commit 5250c0e into sycl Feb 21, 2025
55 of 56 checks passed
@aelovikov-intel aelovikov-intel deleted the benchmarking-workflow branch February 21, 2025 16:02
@sarnex
Copy link
Contributor

sarnex commented Feb 24, 2025

@ianayl This appers to be failing in the nightly, can you take a look? Thx

https://github.com/intel/llvm/actions/runs/13490029293

@ianayl
Copy link
Contributor Author

ianayl commented Feb 24, 2025

@sarnex Failure is intentional, looks like we have regressions

  • Gen12:
    • api overhead :: exec immediate benchmark
    • memory benchmark :: stream memory benchmark
  • PVC:
    • memory benchmark :: queue memcpy benchmark

Failures are consistent over the weekend, with the last successful run being Friday: I'm guessing something got pushed on Friday (in either SYCL or compute-benchmarks) that caused a regression

@sarnex
Copy link
Contributor

sarnex commented Feb 24, 2025

Ah ok thanks. Can you make GH issues for those? I can start doing it after the first time as I usually check the nightly results and file bugs daily. Thx

@ianayl
Copy link
Contributor Author

ianayl commented Feb 24, 2025

Definitely, I'll file the github issues right after I get sometime to investigate

@@ -243,6 +243,46 @@ jobs:
sycl_toolchain_decompress_command: ${{ needs.ubuntu2204_build.outputs.artifact_decompress_command }}
sycl_cts_artifact: sycl_cts_bin

aggregate_benchmark_results:
if: always() && !cancelled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianayl hi, likely there should also be github.repository == 'intel/llvm' && ... to avoid running this in forks. @intel/dpcpp-devops-reviewers FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: fix for this is in #17122, I will merge after I do some final testing with my tuning here

kurapov-peter pushed a commit to kurapov-peter/llvm that referenced this pull request Mar 5, 2025
…intel#14454)

This PR:
- adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks
a given SYCL branch/build using
[compute-benchmarks](https://github.com/intel/compute-benchmarks/)
    - stores benchmark results in a git repo, and
- aggregates benchmark results in order to produce a median, which is
used to pass or fail the benchmark workflow

The current plan is to enable this benchmark to run nightly in order to
catch regressions, although there is potential for this workflow to be
used in precommit. As a result, a lot of components in this workflow are
either separate reusable components, or directly written with precommit
in mind. The current benchmarking workflow functions as so:
1. An "aggregate" workflow is ran, which aggregates historic benchmark
results in the aforementioned git repo, and produces a historical median
- This calls upon aggregate.py to handle the actual compute
heavy-lifting
2. The core benchmarking workflow is ran:
- This calls upon benchmark.sh, which handles the logic for building and
running compute-benchmarks
- Then, compare.py is called upon for the actual comparing of benchmark
data against the historical median generated prior

The workflows are fully configurable via benchmark-ci.conf; enabled
compute-benchmarks tests can be configured via enabled_tests.conf.

Feel free to test out the workflow via manual dispatches of
sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware
that the run currently will always fail, as Github repository secrets
are not yet added.

---------

Co-authored-by: aelovikov-intel <[email protected]>
sarnex added a commit that referenced this pull request Mar 13, 2025
This PR tunes the nightly benchmarking job to produce more consistent
results:
- Lowers the tolerance threshold of benchmarking results accepted from
50% to 8%
  - Nightly was flaking before even with a 50% tolerance threshold
- Raises the iterations to 5000
- Using 10,000 iterations did not result in significantly more stable
performance, although this may change as we obtain more data
- However, the PVC benchmarking job in the overall nightly workflow now
takes about ~47 minutes, whereas before the PVC benchmarking job took
~14 minutes
- This should not have major impact on execution time however,
considering the E2E tests take ~42 minutes: Since both these jobs run in
parallel on different machines, the theoretical effect on the overall
workflow should only be about 5 minutes, although this would depend on
whether or not machines are able to be scheduled in time.
- Changes the benchmarking workflows in sycl-nightly.yml to use the
tuned PERF_PVC runner
- Untuned machines are exhibiting large variations when running
compute-benchmarks (20-25%, up to 50% in the worst case scenario): These
are unacceptable variations and not particularly useful.
- Disables nightly benchmarking on gen12:
- Gen12 machines are currently untuned. Similar to PVC machines, these
results are not accurate and not worth serious nightly benchmarking.
- Adds guards for benchmarking jobs to prevent benchmark runs in forks
#14454 (comment)

---------

Co-authored-by: Nick Sarnie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants