Skip to content

[DeviceSanitizer] Support multiple error reports (-fsanitize-recover=address) #13948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 27, 2024

Conversation

AllanZyne
Copy link
Contributor

@AllanZyne AllanZyne commented May 29, 2024

UR: oneapi-src/unified-runtime#1677

In kernel, we save at most ASAN_MAX_NUM_REPORTS (default: 10) number of SanitizerReport.
We select the index of SanitizerReport by WG_LINEAR_ID % ASAN_MAX_NUM_REPORTS.

When -fsanitize-recover=address is passed in compiler flag, asan_loadX_noabort/asan_storeX_noabort is used, we use is_recover = true flag to distinguish this case.
If is_recover is true, the UR will print out all the error reports and continue (use at your own risk).
If is_recover is false (default case), the UR will print out only one error report and exit.

@AllanZyne AllanZyne requested a review from a team as a code owner May 29, 2024 01:59
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it looks reasonable, but I worry that the register pressure could make this unusable on certain application.

LocalArgsInfo *LocalArgs = nullptr; // ordered by ArgIndex
LocalArgsInfo *LocalArgs = nullptr; // Ordered by ArgIndex

DeviceSanitizerReport SanitizerReport[ASAN_MAX_NUM_REPORTS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any concerns about register-pressure related to this? Would it make sense to somehow let users use a more light-weight version with fewer reports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LaunchInfo is allocated in shared USM, so I think it won't cause register-pressure here.

__spirv_BuiltInWorkgroupId.z;

auto &SanitizerReport = ((__SYCL_GLOBAL__ LaunchInfo *)__AsanLaunchInfo)
->SanitizerReport[WG_LID % ASAN_MAX_NUM_REPORTS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.
Program behavior is probably more predicable if we use an atomically increasing index instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Program behavior is probably more predicable if we use an atomically increasing index instead.

What does predicable mean here?
Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.
I didn't try to save as most as possible error reports because I think it's not necessary (as most of them are likely pointing to the same error location).

Maybe it would be better to save more different types of error, but it's too complicated to implement now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even using atomic counter, we can't guarantee that we can save all error reports in your case, and their order are also undetermined.

Right, order is indeterministic. Could you please elaborate why we can't guarantee that all of the two errors are not reported in my case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an error in a workgroup and another error in an another workgroup, these two errors may be saved to the same index and either of them has a chance to be kept.

I don't understand, why does either of them has a chance to be kept?
They can save into specific index of reports unless that index is already used.

Copy link
Contributor

@wenju-he wenju-he Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the order of workgroups execution is indeterministic.
When there is clash in indexing, we can't say which one is saved since the second error won't be save after the first one is saved.

Yep. I think it's okay to save either one.

@AllanZyne AllanZyne requested a review from wenju-he June 12, 2024 01:44
Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AllanZyne AllanZyne requested a review from a team as a code owner June 20, 2024 03:36
@AllanZyne
Copy link
Contributor Author

@intel/unified-runtime-reviewers @intel/dpcpp-sanitizers-review Please review.
Thanks very much!

Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@callumfare
Copy link
Contributor

@AllanZyne oneapi-src/unified-runtime#1677 has been merged, please merge the latest sycl branch in and update the UR tag as suggested

@AllanZyne
Copy link
Contributor Author

@AllanZyne oneapi-src/unified-runtime#1677 has been merged, please merge the latest sycl branch in and update the UR tag as suggested

Done

@callumfare
Copy link
Contributor

@intel/llvm-gatekeepers Please merge

@sommerlukas sommerlukas merged commit 7b4fbac into sycl Jun 27, 2024
14 checks passed
@AllanZyne AllanZyne deleted the review/yang/multiple_reports branch June 28, 2024 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants