-
Notifications
You must be signed in to change notification settings - Fork 768
[DeviceSanitizer] Support multiple error reports (-fsanitize-recover=address) #13948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it looks reasonable, but I worry that the register pressure could make this unusable on certain application.
LocalArgsInfo *LocalArgs = nullptr; // ordered by ArgIndex | ||
LocalArgsInfo *LocalArgs = nullptr; // Ordered by ArgIndex | ||
|
||
DeviceSanitizerReport SanitizerReport[ASAN_MAX_NUM_REPORTS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any concerns about register-pressure related to this? Would it make sense to somehow let users use a more light-weight version with fewer reports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LaunchInfo
is allocated in shared USM, so I think it won't cause register-pressure here.
__spirv_BuiltInWorkgroupId.z; | ||
|
||
auto &SanitizerReport = ((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo) | ||
->SanitizerReport[WG_LID % ASAN_MAX_NUM_REPORTS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.
Program behavior is probably more predicable if we use an atomically increasing index instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Program behavior is probably more predicable if we use an atomically increasing index instead.
What does predicable mean here?
Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.
I didn't try to save as most as possible error reports because I think it's not necessary (as most of them are likely pointing to the same error location).
Maybe it would be better to save more different types of error, but it's too complicated to implement now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.
Right, order is indeterministic. Could you please elaborate why we can't guarantee that all of the two errors are not reported in my case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.
I don't understand, why does either of them has a chance to be kept?
They can save into specific index of reports unless that index is already used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.
Yep. I think it's okay to save either one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@intel/unified-runtime-reviewers @intel/dpcpp-sanitizers-review Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@AllanZyne oneapi-src/unified-runtime#1677 has been merged, please merge the latest sycl branch in and update the UR tag as suggested |
Co-authored-by: Callum Fare <[email protected]>
Done |
@intel/llvm-gatekeepers Please merge |
UR: oneapi-src/unified-runtime#1677
In kernel, we save at most
ASAN_MAX_NUM_REPORTS
(default: 10) number of SanitizerReport.We select the index of SanitizerReport by
WG_LINEAR_ID % ASAN_MAX_NUM_REPORTS
.When
-fsanitize-recover=address
is passed in compiler flag,asan_loadX_noabort
/asan_storeX_noabort
is used, we useis_recover = true
flag to distinguish this case.If
is_recover
is true, the UR will print out all the error reports and continue (use at your own risk).If
is_recover
is false (default case), the UR will print out only one error report and exit.