Skip to content

[DeviceSanitizer] Support multiple error reports (-fsanitize-recover=address) #13948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions clang/lib/Driver/SanitizerArgs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1160,6 +1160,10 @@ void SanitizerArgs::addArgs(const ToolChain &TC, const llvm::opt::ArgList &Args,
CmdArgs.push_back("-asan-stack=0");
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("-asan-globals=0");

if (!RecoverableSanitizers.empty())
CmdArgs.push_back(Args.MakeArgString("-fsanitize-recover=" +
toString(RecoverableSanitizers)));
}
return;
}
Expand Down
13 changes: 13 additions & 0 deletions clang/test/Driver/sycl-device-sanitizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,19 @@
// SYCL-ASAN-SAME: "-mllvm" "-asan-stack=0"
// SYCL-ASAN-SAME: "-mllvm" "-asan-globals=0"

// RUN: %clangxx -fsycl -Xarch_device -fsanitize=address -c %s -### 2>&1 \
// RUN: | FileCheck --check-prefix=SYCL-XARCH-DEVICE %s
// SYCL-XARCH-DEVICE: clang{{.*}} "-fsycl-is-device"
// SYCL-XARCH-DEVICE-SAME: -fsanitize=address

// RUN: %clangxx -fsycl -Xarch_device -fsanitize=address -Xarch_device -fsanitize-recover=address -c %s -### 2>&1 \
// RUN: | FileCheck --check-prefix=SYCL-ASAN-RECOVER %s
// SYCL-ASAN-RECOVER: clang{{.*}} "-fsycl-is-device"
// SYCL-ASAN-RECOVER-SAME: -fsanitize=address
// SYCL-ASAN-RECOVER-SAME: -fsanitize-recover=address

/// Make sure "-asan-stack" is always disabled
///
// RUN: %clangxx -fsycl -fsanitize=address -mllvm -asan-stack=1 -c %s -### 2>&1 \
// RUN: | FileCheck --check-prefix=SYCL-ASAN-FILTER %s
// SYCL-ASAN-FILTER: clang{{.*}} "-fsycl-is-device"
Expand Down
11 changes: 7 additions & 4 deletions libdevice/include/asan_libdevice.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,19 @@ struct LocalArgsInfo {
uint64_t SizeWithRedZone = 0;
};

constexpr std::size_t ASAN_MAX_NUM_REPORTS = 10;

struct LaunchInfo {
uintptr_t PrivateShadowOffset =
0; // don't move this field, we use it in AddressSanitizerPass
// Don't move this field, we use it in AddressSanitizerPass
uintptr_t PrivateShadowOffset = 0;

uintptr_t LocalShadowOffset = 0;
uintptr_t LocalShadowOffsetEnd = 0;
DeviceSanitizerReport SanitizerReport;

uint32_t NumLocalArgs = 0;
LocalArgsInfo *LocalArgs = nullptr; // ordered by ArgIndex
LocalArgsInfo *LocalArgs = nullptr; // Ordered by ArgIndex

DeviceSanitizerReport SanitizerReport[ASAN_MAX_NUM_REPORTS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any concerns about register-pressure related to this? Would it make sense to somehow let users use a more light-weight version with fewer reports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LaunchInfo is allocated in shared USM, so I think it won't cause register-pressure here.

};

constexpr unsigned ASAN_SHADOW_SCALE = 3;
Expand Down
24 changes: 20 additions & 4 deletions libdevice/sanitizer_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,17 @@ bool MemIsZero(__SYCL_GLOBAL__ const char *beg, uptr size) {
bool __asan_internal_report_save(DeviceSanitizerErrorType error_type) {
const int Expected = ASAN_REPORT_NONE;
int Desired = ASAN_REPORT_START;
auto &SanitizerReport =
((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo)->SanitizerReport;

// work-group linear id
const auto WG_LID =
__spirv_BuiltInWorkgroupId.x * __spirv_BuiltInNumWorkgroups.y *
__spirv_BuiltInNumWorkgroups.z +
__spirv_BuiltInWorkgroupId.y * __spirv_BuiltInNumWorkgroups.z +
__spirv_BuiltInWorkgroupId.z;

auto &SanitizerReport = ((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo)
->SanitizerReport[WG_LID % ASAN_MAX_NUM_REPORTS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.
Program behavior is probably more predicable if we use an atomically increasing index instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Program behavior is probably more predicable if we use an atomically increasing index instead.

What does predicable mean here?
Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.
I didn't try to save as most as possible error reports because I think it's not necessary (as most of them are likely pointing to the same error location).

Maybe it would be better to save more different types of error, but it's too complicated to implement now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.

Right, order is indeterministic. Could you please elaborate why we can't guarantee that all of the two errors are not reported in my case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.

I don't understand, why does either of them has a chance to be kept?
They can save into specific index of reports unless that index is already used.

Copy link
Contributor

@wenju-he wenju-he Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.

Yep. I think it's okay to save either one.


if (atomicCompareAndSet(&SanitizerReport.Flag, Desired, Expected) ==
Expected) {
SanitizerReport.ErrorType = error_type;
Expand Down Expand Up @@ -315,8 +324,15 @@ bool __asan_internal_report_save(
launch_info->NumLocalArgs, launch_info->LocalArgs);
}

auto &SanitizerReport =
((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo)->SanitizerReport;
// work-group linear id
const auto WG_LID =
__spirv_BuiltInWorkgroupId.x * __spirv_BuiltInNumWorkgroups.y *
__spirv_BuiltInNumWorkgroups.z +
__spirv_BuiltInWorkgroupId.y * __spirv_BuiltInNumWorkgroups.z +
__spirv_BuiltInWorkgroupId.z;

auto &SanitizerReport = ((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo)
->SanitizerReport[WG_LID % ASAN_MAX_NUM_REPORTS];

if (atomicCompareAndSet(&SanitizerReport.Flag, Desired, Expected) ==
Expected) {
Expand Down
14 changes: 7 additions & 7 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,14 +99,14 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit 4f105262c30ac231b8db1e250f36e88ef9f0a36d
# Merge: 0f118d75 92fce2ee
set(UNIFIED_RUNTIME_REPO "https://github.com/AllanZyne/unified-runtime.git")
# commit 396fb20498c315a526c961d7cb645b42795acd2c
# Merge: 719bb9cd e2ffea69
# Author: Kenneth Benzie (Benie) <[email protected]>
# Date: Mon Jun 10 13:23:16 2024 +0100
# Merge pull request #1409 from omarahmed1111/Add-CTS-tests-for-image-format
# [CTS] Add CTS tests for urMemImageCreate entry-point
set(UNIFIED_RUNTIME_TAG 4f105262c30ac231b8db1e250f36e88ef9f0a36d)
# Date: Thu May 23 10:53:03 2024 +0100
# Merge pull request #1501 from RossBrunton/ross/kerneltests
# [Testing] Spec clarifications and testing updates for kernel
set(UNIFIED_RUNTIME_TAG review/yang/multiple_reports)

fetch_adapter_source(level_zero
${UNIFIED_RUNTIME_REPO}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// REQUIRES: linux, cpu
// RUN: %{build} %device_asan_flags -Xarch_device -fsanitize-recover=address -O2 -g -o %t
// RUN: env SYCL_PREFER_UR=1 %{run} %t 2>&1 | FileCheck %s

#include <sycl/detail/core.hpp>
#include <sycl/usm.hpp>

int main() {
sycl::queue Q;
constexpr std::size_t N = 4;
auto *array = sycl::malloc_device<int>(N, Q);

Q.submit([&](sycl::handler &h) {
h.parallel_for<class Kernel1>(
sycl::nd_range<1>(N + 1, 1),
[=](sycl::nd_item<1> item) { ++array[item.get_global_id(0)]; });
}).wait();
// CHECK: ERROR: DeviceSanitizer: out-of-bounds-access on Device USM
// CHECK: {{READ of size 4 at kernel <.*Kernel1> LID\(0, 0, 0\) GID\(4, 0, 0\)}}
// CHECK: {{ #0 .* .*multiple_kernels.cpp:}}[[@LINE-4]]

Q.submit([&](sycl::handler &h) {
h.parallel_for<class Kernel2>(
sycl::nd_range<1>(N + 1, 1),
[=](sycl::nd_item<1> item) { ++array[item.get_global_id(0)]; });
}).wait();
// CHECK: ERROR: DeviceSanitizer: out-of-bounds-access on Device USM
// CHECK: {{READ of size 4 at kernel <.*Kernel2> LID\(0, 0, 0\) GID\(4, 0, 0\)}}
// CHECK: {{ #0 .* .*multiple_kernels.cpp:}}[[@LINE-4]]

Q.submit([&](sycl::handler &h) {
h.parallel_for<class Kernel3>(
sycl::nd_range<1>(N + 1, 1),
[=](sycl::nd_item<1> item) { ++array[item.get_global_id(0)]; });
}).wait();
// CHECK: ERROR: DeviceSanitizer: out-of-bounds-access on Device USM
// CHECK: {{READ of size 4 at kernel <.*Kernel3> LID\(0, 0, 0\) GID\(4, 0, 0\)}}
// CHECK: {{ #0 .* .*multiple_kernels.cpp:}}[[@LINE-4]]

Q.submit([&](sycl::handler &h) {
h.parallel_for<class Kernel4>(
sycl::nd_range<1>(N + 1, 1),
[=](sycl::nd_item<1> item) { ++array[item.get_global_id(0)]; });
}).wait();
// CHECK: ERROR: DeviceSanitizer: out-of-bounds-access on Device USM
// CHECK: {{READ of size 4 at kernel <.*Kernel4> LID\(0, 0, 0\) GID\(4, 0, 0\)}}
// CHECK: {{ #0 .* .*multiple_kernels.cpp:}}[[@LINE-4]]

return 0;
}
31 changes: 31 additions & 0 deletions sycl/test-e2e/AddressSanitizer/multiple-reports/one_kernel.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// REQUIRES: linux, cpu
// RUN: %{build} %device_asan_flags -Xarch_device -fsanitize-recover=address -O2 -g -o %t
// RUN: env SYCL_PREFER_UR=1 %{run} %t 2>&1 | FileCheck %s

#include <sycl/detail/core.hpp>
#include <sycl/usm.hpp>

int main() {
sycl::queue Q;
constexpr std::size_t N = 1024;
auto *array = sycl::malloc_device<int>(N, Q);

Q.submit([&](sycl::handler &h) {
h.parallel_for<class Kernel>(
sycl::nd_range<1>(N + 20, 1),
[=](sycl::nd_item<1> item) { ++array[item.get_global_id(0)]; });
}).wait();
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK: ====ERROR: DeviceSanitizer
// CHECK-NOT: ====ERROR: DeviceSanitizer

return 0;
}
Loading