Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

Merged
merged 7 commits into from
May 5, 2021

Conversation

vladimirlaz
Copy link

  • added processing of vectors of 3 and 16 elements;
  • increase local group size to fit 16-element vectors when sub-group size is 16;
  • tune check to avoid overlapping data ranges inside and between local groups;
  • increase threshold because cumulative error raises due to increase vector elements number.

@@ -98,7 +109,7 @@ template <typename T2> struct utils<T2, 16> {

template <typename T> void exit_if_not_equal(T val, T ref, const char *name) {
if (std::is_floating_point<T>::value) {
if (std::fabs(val - ref) > 0.01) {
if (std::fabs(val - ref) > 0.02) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason to increase the threshold?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason to increase the threshold?

During data verification all elements of vectors are added and this value is compared with reference one. If we increase number of added elements twice potential cumulative error is increased twice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be better to switch to validating relative error instead of absolute one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does not target specific accuracy goals. It checks that the return values are not something completely different from expected.
Does it make sense to invest in tuning accuracy of the test?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we use input data, so that results have no error at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bader, could you, please, describe a bit? I think I didn't get it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During data verification all elements of vectors are added and this value is compared with reference one.

When you do FP addition, the computed result is rounded to fit into resulting data type. The error occurs only if add result can't be exactly preserved and has to be rounded. E.g. T == float, 2^{20} + 2^{-10} can't be represented "exactly", but 2^{20} + 2^{21} can. I suggest using input values, so that rounding error will be 0, so you can always use exact match.

BTW, using std::fabs here reduces accuracy for T == double.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented bitwise comparison from floating point type except half.
Valid range for half type is too narrow and will require to rework data for several tests. This is out of scope for current PR. But aligning input data for current test allow to revert threshold increase.

@vladimirlaz
Copy link
Author

The test will start to pass once intel/llvm#3617 submitted

…f skipping the whole test when CPU device is available on host machine
@@ -98,7 +109,7 @@ template <typename T2> struct utils<T2, 16> {

template <typename T> void exit_if_not_equal(T val, T ref, const char *name) {
if (std::is_floating_point<T>::value) {
if (std::fabs(val - ref) > 0.01) {
if (std::fabs(val - ref) > 0.02) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be better to switch to validating relative error instead of absolute one?

@vladimirlaz vladimirlaz requested a review from s-kanaev April 26, 2021 12:46
vladimirlaz added a commit to intel/llvm that referenced this pull request Apr 27, 2021
 - when 3-element vector is passed it is packed as single element and 2 element vector;
 - when 16-element vector is passed it is packed as 2 sequential 8-element vectors.
The test has changed in scope of intel/llvm-test-suite#253
@vladimirlaz vladimirlaz requested a review from bader May 5, 2021 12:21
@vladimirlaz vladimirlaz merged commit c78a789 into intel:intel May 5, 2021
@vladimirlaz vladimirlaz deleted the sg_load_store_v16_v3 branch May 5, 2021 16:14
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…tors (intel/llvm-test-suite#253)

 - Add processing of vectors of 3 and 16 elements.
 - Increase local group size to fit 16-element vectors when sub-group size is 16.
 - Tune check to avoid overlapping data ranges inside and between local groups.
 - Implemented bitwise comparison for floating point types except half. Valid range for half type is too narrow and will require to rework data for several tests. This is out of scope for current PR. .
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants