-
Notifications
You must be signed in to change notification settings - Fork 770
[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Latest runs:
|
@intel/llvm-gatekeepers PR is ready for merge, thanks! Test failure on BMG is a known issue: #17075 (comment) |
@ianayl This appers to be failing in the nightly, can you take a look? Thx https://github.com/intel/llvm/actions/runs/13490029293 |
@sarnex Failure is intentional, looks like we have regressions
Failures are consistent over the weekend, with the last successful run being Friday: I'm guessing something got pushed on Friday (in either SYCL or compute-benchmarks) that caused a regression |
Ah ok thanks. Can you make GH issues for those? I can start doing it after the first time as I usually check the nightly results and file bugs daily. Thx |
Definitely, I'll file the github issues right after I get sometime to investigate |
@@ -243,6 +243,46 @@ jobs: | |||
sycl_toolchain_decompress_command: ${{ needs.ubuntu2204_build.outputs.artifact_decompress_command }} | |||
sycl_cts_artifact: sycl_cts_bin | |||
|
|||
aggregate_benchmark_results: | |||
if: always() && !cancelled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianayl hi, likely there should also be github.repository == 'intel/llvm' && ...
to avoid running this in forks. @intel/dpcpp-devops-reviewers FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw: fix for this is in #17122, I will merge after I do some final testing with my tuning here
…intel#14454) This PR: - adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks a given SYCL branch/build using [compute-benchmarks](https://github.com/intel/compute-benchmarks/) - stores benchmark results in a git repo, and - aggregates benchmark results in order to produce a median, which is used to pass or fail the benchmark workflow The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so: 1. An "aggregate" workflow is ran, which aggregates historic benchmark results in the aforementioned git repo, and produces a historical median - This calls upon aggregate.py to handle the actual compute heavy-lifting 2. The core benchmarking workflow is ran: - This calls upon benchmark.sh, which handles the logic for building and running compute-benchmarks - Then, compare.py is called upon for the actual comparing of benchmark data against the historical median generated prior The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf. Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added. --------- Co-authored-by: aelovikov-intel <[email protected]>
This PR tunes the nightly benchmarking job to produce more consistent results: - Lowers the tolerance threshold of benchmarking results accepted from 50% to 8% - Nightly was flaking before even with a 50% tolerance threshold - Raises the iterations to 5000 - Using 10,000 iterations did not result in significantly more stable performance, although this may change as we obtain more data - However, the PVC benchmarking job in the overall nightly workflow now takes about ~47 minutes, whereas before the PVC benchmarking job took ~14 minutes - This should not have major impact on execution time however, considering the E2E tests take ~42 minutes: Since both these jobs run in parallel on different machines, the theoretical effect on the overall workflow should only be about 5 minutes, although this would depend on whether or not machines are able to be scheduled in time. - Changes the benchmarking workflows in sycl-nightly.yml to use the tuned PERF_PVC runner - Untuned machines are exhibiting large variations when running compute-benchmarks (20-25%, up to 50% in the worst case scenario): These are unacceptable variations and not particularly useful. - Disables nightly benchmarking on gen12: - Gen12 machines are currently untuned. Similar to PVC machines, these results are not accurate and not worth serious nightly benchmarking. - Adds guards for benchmarking jobs to prevent benchmark runs in forks #14454 (comment) --------- Co-authored-by: Nick Sarnie <[email protected]>
This PR:
The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so:
The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf.
Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added.