[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

ianayl · 2024-07-04T15:54:39Z

This PR:

adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks a given SYCL branch/build using compute-benchmarks
- stores benchmark results in a git repo, and
- aggregates benchmark results in order to produce a median, which is used to pass or fail the benchmark workflow

The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so:

An "aggregate" workflow is ran, which aggregates historic benchmark results in the aforementioned git repo, and produces a historical median
- This calls upon aggregate.py to handle the actual compute heavy-lifting
The core benchmarking workflow is ran:
- This calls upon benchmark.sh, which handles the logic for building and running compute-benchmarks
- Then, compare.py is called upon for the actual comparing of benchmark data against the historical median generated prior

The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf.

Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added.

…g-workflow

.github/workflows/sycl-linux-run-tests.yml

ianayl · 2025-02-21T06:34:42Z

Latest runs:
sycl-nightly.yml Manual runs:

sycl-linux-run-tests.yml Manual runs:

ianayl · 2025-02-21T14:48:09Z

@intel/llvm-gatekeepers PR is ready for merge, thanks!

Test failure on BMG is a known issue: #17075 (comment)
Changes are unrelated

sarnex · 2025-02-24T15:31:56Z

@ianayl This appers to be failing in the nightly, can you take a look? Thx

https://github.com/intel/llvm/actions/runs/13490029293

ianayl · 2025-02-24T15:49:45Z

@sarnex Failure is intentional, looks like we have regressions

Gen12:
- api overhead :: exec immediate benchmark
- memory benchmark :: stream memory benchmark
PVC:
- memory benchmark :: queue memcpy benchmark

Failures are consistent over the weekend, with the last successful run being Friday: I'm guessing something got pushed on Friday (in either SYCL or compute-benchmarks) that caused a regression

sarnex · 2025-02-24T15:54:43Z

Ah ok thanks. Can you make GH issues for those? I can start doing it after the first time as I usually check the nightly results and file bugs daily. Thx

ianayl · 2025-02-24T16:09:38Z

Definitely, I'll file the github issues right after I get sometime to investigate

KornevNikita · 2025-03-05T11:08:19Z

.github/workflows/sycl-nightly.yml

@@ -243,6 +243,46 @@ jobs:
      sycl_toolchain_decompress_command: ${{ needs.ubuntu2204_build.outputs.artifact_decompress_command }}
      sycl_cts_artifact: sycl_cts_bin

+  aggregate_benchmark_results:
+    if: always() && !cancelled()


@ianayl hi, likely there should also be github.repository == 'intel/llvm' && ... to avoid running this in forks. @intel/dpcpp-devops-reviewers FYI

Btw: fix for this is in #17122, I will merge after I do some final testing with my tuning here

…intel#14454) This PR: - adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks a given SYCL branch/build using [compute-benchmarks](https://github.com/intel/compute-benchmarks/) - stores benchmark results in a git repo, and - aggregates benchmark results in order to produce a median, which is used to pass or fail the benchmark workflow The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so: 1. An "aggregate" workflow is ran, which aggregates historic benchmark results in the aforementioned git repo, and produces a historical median - This calls upon aggregate.py to handle the actual compute heavy-lifting 2. The core benchmarking workflow is ran: - This calls upon benchmark.sh, which handles the logic for building and running compute-benchmarks - Then, compare.py is called upon for the actual comparing of benchmark data against the historical median generated prior The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf. Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added. --------- Co-authored-by: aelovikov-intel <[email protected]>

This PR tunes the nightly benchmarking job to produce more consistent results: - Lowers the tolerance threshold of benchmarking results accepted from 50% to 8% - Nightly was flaking before even with a 50% tolerance threshold - Raises the iterations to 5000 - Using 10,000 iterations did not result in significantly more stable performance, although this may change as we obtain more data - However, the PVC benchmarking job in the overall nightly workflow now takes about ~47 minutes, whereas before the PVC benchmarking job took ~14 minutes - This should not have major impact on execution time however, considering the E2E tests take ~42 minutes: Since both these jobs run in parallel on different machines, the theoretical effect on the overall workflow should only be about 5 minutes, although this would depend on whether or not machines are able to be scheduled in time. - Changes the benchmarking workflows in sycl-nightly.yml to use the tuned PERF_PVC runner - Untuned machines are exhibiting large variations when running compute-benchmarks (20-25%, up to 50% in the worst case scenario): These are unacceptable variations and not particularly useful. - Disables nightly benchmarking on gen12: - Gen12 machines are currently untuned. Similar to PVC machines, these results are not accurate and not worth serious nightly benchmarking. - Adds guards for benchmarking jobs to prevent benchmark runs in forks #14454 (comment) --------- Co-authored-by: Nick Sarnie <[email protected]>

ianayl added 6 commits July 4, 2024 08:30

[SYCL] Introduce PoC benchmarking option into sycl-linux-run-tests

1276f39

[SYCL] Fix EOF character for sycl-linux-run-tests

754c33a

[SYCL] Fix: Make benchmarking script executable

9981c3a

[SYCL] Bug fix on sycl-bench.sh

3ed35ca

[SYCL] Amend workflow as per review in #14351

6ea0110

Merge branch 'sycl' of https://github.com/intel/llvm into benchmarkin…

6d14a32

…g-workflow

ianayl requested a review from a team as a code owner July 4, 2024 15:54

ianayl temporarily deployed to WindowsCILock July 4, 2024 15:54 — with GitHub Actions Inactive

ianayl marked this pull request as draft July 4, 2024 15:54

ianayl temporarily deployed to WindowsCILock July 4, 2024 15:54 — with GitHub Actions Inactive

ianayl mentioned this pull request Jul 4, 2024

[DO NOT MERGE] Test PR for new benchmarking workflow #14351

Closed

ianayl temporarily deployed to WindowsCILock July 4, 2024 19:08 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock July 4, 2024 19:35 — with GitHub Actions Inactive

aelovikov-intel reviewed Jul 8, 2024

View reviewed changes

.github/workflows/sycl-linux-run-tests.yml Outdated Show resolved Hide resolved

Complete redo of workflow, switch to compute-benchmarks

940e3be

ianayl temporarily deployed to WindowsCILock September 6, 2024 06:28 — with GitHub Actions Inactive

ianayl changed the title ~~[CI][SYCL] Add PoC CI workflow to run sycl-bench micro-benchmarking suite~~ [CI][SYCL][Do not merge] Add CI workflow to run compute-benchmarks on incoming syclos PRs Sep 6, 2024

ianayl temporarily deployed to WindowsCILock September 6, 2024 06:41 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock September 6, 2024 07:10 — with GitHub Actions Inactive

Add safeguards and update usage(), names

7aafdf5

ianayl temporarily deployed to WindowsCILock September 12, 2024 01:35 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock September 12, 2024 01:48 — with GitHub Actions Inactive

ianayl added 2 commits September 11, 2024 18:50

Change SYCL_PI_TRACE to SYCL_UR_TRACE

24b5169

Test commit to learn more about the workflow

b2d4463

ianayl temporarily deployed to WindowsCILock September 12, 2024 02:41 — with GitHub Actions Inactive

ianayl had a problem deploying to WindowsCILock September 12, 2024 02:55 — with GitHub Actions Error

Test commit to integrate building into the workflow

5f1cd57

ianayl temporarily deployed to WindowsCILock September 12, 2024 03:29 — with GitHub Actions Inactive

ianayl temporarily deployed to aws February 21, 2025 02:05 — with GitHub Actions Inactive

ianayl had a problem deploying to WindowsCILock February 21, 2025 05:04 — with GitHub Actions Error

ianayl temporarily deployed to aws February 21, 2025 05:12 — with GitHub Actions Inactive

Fix bad naming after removing constants

9345dc0

ianayl had a problem deploying to WindowsCILock February 21, 2025 05:33 — with GitHub Actions Error

ianayl temporarily deployed to WindowsCILock February 21, 2025 05:33 — with GitHub Actions Inactive

ianayl had a problem deploying to WindowsCILock February 21, 2025 05:47 — with GitHub Actions Error

ianayl temporarily deployed to aws February 21, 2025 05:53 — with GitHub Actions Inactive

apply clang-format

8df9b21

ianayl temporarily deployed to WindowsCILock February 21, 2025 06:09 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock February 21, 2025 06:24 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock February 21, 2025 07:08 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock February 21, 2025 14:51 — with GitHub Actions Inactive

ianayl temporarily deployed to aws February 21, 2025 15:23 — with GitHub Actions Inactive

ianayl temporarily deployed to WindowsCILock February 21, 2025 15:59 — with GitHub Actions Inactive

aelovikov-intel merged commit 5250c0e into sycl Feb 21, 2025
55 of 56 checks passed

aelovikov-intel deleted the benchmarking-workflow branch February 21, 2025 16:02

ianayl temporarily deployed to aws February 21, 2025 16:10 — with GitHub Actions Inactive

KornevNikita reviewed Mar 5, 2025

View reviewed changes

ianayl mentioned this pull request Mar 5, 2025

[CI] Tune nightly benchmarking job for better reliability #17122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

ianayl commented Jul 4, 2024 •

edited

Loading

ianayl commented Feb 21, 2025 •

edited

Loading

ianayl commented Feb 21, 2025 •

edited

Loading

sarnex commented Feb 24, 2025

ianayl commented Feb 24, 2025 •

edited

Loading

sarnex commented Feb 24, 2025 •

edited

Loading

ianayl commented Feb 24, 2025

KornevNikita Mar 5, 2025

ianayl Mar 5, 2025

[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454

Conversation

ianayl commented Jul 4, 2024 • edited Loading

ianayl commented Feb 21, 2025 • edited Loading

ianayl commented Feb 21, 2025 • edited Loading

sarnex commented Feb 24, 2025

ianayl commented Feb 24, 2025 • edited Loading

sarnex commented Feb 24, 2025 • edited Loading

ianayl commented Feb 24, 2025

KornevNikita Mar 5, 2025

Choose a reason for hiding this comment

ianayl Mar 5, 2025

Choose a reason for hiding this comment

ianayl commented Jul 4, 2024 •

edited

Loading

ianayl commented Feb 21, 2025 •

edited

Loading

ianayl commented Feb 21, 2025 •

edited

Loading

ianayl commented Feb 24, 2025 •

edited

Loading

sarnex commented Feb 24, 2025 •

edited

Loading