-
Notifications
You must be signed in to change notification settings - Fork 125
Replace loader handles with field at start of handle data #2622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9646680
to
1a50495
Compare
1a50495
to
974b9eb
Compare
Compute Benchmarks level_zero run (with params: ): |
Compute Benchmarks level_zero run (): SummaryTotal 38 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group memory (4): 100.808%
Relative perf in group api (12): 101.685%
Relative perf in group Velocity-Bench (9): 99.170%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 98.331%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 99.292%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 102.184%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 98.976%
Relative perf in group alloc/min (4): 100.564%
Relative perf in group multiple (12): 100.475%
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (10): cannot calculate
Relative perf in group graph (10): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command...memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Velocity-Bench dl-mnistEnvironment Variables:NEOReadDebugKeys=1 Command:/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO alloc/size:10000/0/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv |
a5e38c1
to
3d54672
Compare
3bfdda6
to
3c26247
Compare
6387d75
to
a8af4b1
Compare
Unified Runtime -> intel/llvm Repo Move NoticeInformationThe source code of Unified Runtime has been moved to intel/llvm under the unified-runtime top-level directory, The code will be mirrored to oneapi-src/unified-runtime and the specification will continue to be hosted at oneapi-src.github.io/unified-runtime. The contribution guide has been updated with new instructions for contributing to Unified Runtime. PR MigrationAll open PRs including this one will be labelled auto-close and shall be automatically closed after 30 days. Should you wish to continue with your PR you will need to migrate it to intel/llvm. This is an automated comment. |
We were reading the kernel arguments at kernel execution time, but kernel arguments are allowed to change between enqueuing and executing. Make sure to create a copy of kernel arguments ahead of time. This was previously approved as a unified-runtime PR: oneapi-src#2700
Update UMF to the commit: ``` commit 5a515c56c92be75944c8246535c408cee7711114 Author: Lukasz Dorau <[email protected]> Date: Mon Feb 17 10:56:05 2025 +0100 Merge pull request oneapi-src#1086 from vinser52/svinogra_l0_linking ``` to fix the issue in LLVM (SYCL/CUDA): intel/llvm#16944 [SYCL][CUDA] Nsys profiling broken after memory providers change Moved from: oneapi-src#2708 Fixes: intel/llvm#16944 Signed-off-by: Lukasz Dorau <[email protected]>
Adds implements calls shared between command buffer and queue in unified-runtime level-zero v2 adapter and moves the shared code to `command_list_manager.cpp`
As discussed in oneapi-src#2670 (comment) the `pCommandBufferDesc` parameter to `urCommandBufferCreateExp` is optional. However, the UR spec doesn't state what the configuration of the created command-buffer is when this isn't passed, and being optional is also inconsistent with the description parameters to urSamplerCreate & urMemImageCreate which are not optional. This PR updates the descriptor parameter to command-buffer creation to be mandatory to address these concerns. Closes oneapi-src#2673 **Note**: This UR patch was previously approved and ready-to-merge in oneapi-src#2676 prior to the repo move
After the [spec bump of cl_khr_command_buffer to 0.9.7](https://github.com/KhronosGroup/OpenCL-Docs/), in the OpenCL adapter we no longer need to worry about the in-order/out-of-order property of the internal queue used on command-command-buffer creation matching the queue used to enqueue the command-buffer. We can therefore take advantage of the in-order flag passed on UR command-buffer creation to use an in-order queue for command-buffer creation, and omit using sync points. **Note:** This UR patch was previously approved and ready-to-merge prior to the UR repo move in oneapi-src#2681
Fixes #16677 by only setting `-pie` linker option in Release builds on executables rather than on any type of target.
…free related error (#16706) UR: oneapi-src#2592 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
… and improve its conformance test (#17067) Migrated from oneapi-src#2533 This patch turns all of the values returned by urEventGetProfilingInfo to be optional and updates adapters to handle this by returning the appropriate enum when it is not supported. The tests have also been updated, to ensure that returning a counter of "0" or values equal to the previous profiling event is no longer considered a failure.
- Fix group count not being recalculated when a user only passes a new local work size and no new global size - Remove CTS test skips for local update on L0
MSVC warns about a possible uninitialized variable. This is a false positive but explicitly initializing always is harmless, so do this.
This is a first step towards reenabling UR performance testing CI. This introduces the reusable yml workflow and a way to trigger it manually. Here's an example how it looks: pbalcer/llvm#2 (comment)
Use UMF Proxy pool manager with UMF CUDA memory provider in UR. UMF Proxy pool manager is just a wrapper for the UMF memory provider (CUDA memory provider in this case) plus it adds also tracking of memory allocations. Moved from: oneapi-src#2659 Signed-off-by: Lukasz Dorau <[email protected]>
There is always only one, so there's no point in allocating it via `new`. This fixes an issue where calling `urReleaseAdapter` (or any other UR function) in an `atexit` handler could be called after the adapter is deleted.
This replaces the handle logic in the loader from wrapped pointers to a ddi table at the start of the handle struct itself.
a8af4b1
to
c3a8d8d
Compare
Migrated from oneapi-src/unified-runtime#2622 All handles from all backends are now required to implement `ddi_getter` and their first field must be a pointer to a `ur_ddi_table_t` (which also implies that they must not have a vtable). Instead of wrapping handles in a special wrapper type, we instead query the DDI table stored in the handle itself. This simplifies the loader greatly.
Migrated from #2622 All handles from all backends are now required to implement `ddi_getter` and their first field must be a pointer to a `ur_ddi_table_t` (which also implies that they must not have a vtable). Instead of wrapping handles in a special wrapper type, we instead query the DDI table stored in the handle itself. This simplifies the loader greatly.
Migrated from #2622 All handles from all backends are now required to implement `ddi_getter` and their first field must be a pointer to a `ur_ddi_table_t` (which also implies that they must not have a vtable). Instead of wrapping handles in a special wrapper type, we instead query the DDI table stored in the handle itself. This simplifies the loader greatly.
Currently only works for L0 (v1) and Hip.