Skip to content

[SYCL] Reverse max work-group size order #1177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 3, 2020

Conversation

steffenlarsen
Copy link
Contributor

Due to the user defined work-group size being flipped before launching
kernels, the reported max work-group size for a device must be reversed
correspondingly.

Signed-off-by: Steffen Larsen [email protected]

Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do the same for the OpenCL BE? If so, suggest implementing on the SYCL level.

@steffenlarsen
Copy link
Contributor Author

Shouldn't we do the same for the OpenCL BE? If so, suggest implementing on the SYCL level.

Yes, this should also be done for the OpenCL BE, but if we want to do it at SYCL level we will be making a special case in get_info<max_work_item_sizes> and I'd think that's something we'd want to avoid?

@romanovvlad
Copy link
Contributor

Shouldn't we do the same for the OpenCL BE? If so, suggest implementing on the SYCL level.

Yes, this should also be done for the OpenCL BE, but if we want to do it at SYCL level we will be making a special case in get_info<max_work_item_sizes> and I'd think that's something we'd want to avoid?

Shouldn't we do the same for the OpenCL BE? If so, suggest implementing on the SYCL level.

Yes, this should also be done for the OpenCL BE, but if we want to do it at SYCL level we will be making a special case in get_info<max_work_item_sizes> and I'd think that's something we'd want to avoid?

Why would we want to avoid it?
My thinking is that if we reverse work-group sizes passed to piEnqueueLaunchKernel on the SYCL level it's logical that we reverse max work-groups sizes on the same level. Otherwise we introduce implicit dependency between components(backends know that sizes are reversed by the SYCL level).

Due to the user defined work-group size being flipped before launching
kernels, the reported max work-group size for a device must be reversed
correspondingly.

Signed-off-by: Steffen Larsen <[email protected]>
@steffenlarsen steffenlarsen force-pushed the steffen/reverse-wg-size-order branch from 5e930ae to c66a644 Compare March 2, 2020 15:48
@steffenlarsen steffenlarsen changed the title [SYCL][CUDA] Reverse max work-group size order [SYCL] Reverse max work-group size order Mar 2, 2020
@steffenlarsen
Copy link
Contributor Author

steffenlarsen commented Mar 2, 2020

My thinking is that if we reverse work-group sizes passed to piEnqueueLaunchKernel on the SYCL level it's logical that we reverse max work-groups sizes on the same level. Otherwise we introduce implicit dependency between components(backends know that sizes are reversed by the SYCL level).

That makes perfect sense to me. On second look I can see that there are already plenty specializations of device info queries, so I've changed the approach to be for both CUDA and OpenCL.

Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bader bader merged commit 72b7dee into intel:sycl Mar 3, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Mar 4, 2020
…ctor_tests

* origin/sycl: (32 commits)
  [SYCL] Fix circular reference between events and queues (intel#1226)
  [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232)
  [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174)
  [SYCL] allow underscore symbol in temporary directory name
  [SYCL] Reject zero length arrays (intel#1153)
  [SYCL] Fix static code analyzis concerns (intel#1189)
  [SYCL] Add more details about the -fintelfpga option (intel#1218)
  [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223)
  [SYCL] Reverse max work-group size order (intel#1177)
  [SYCL][Doc] Add GroupAlgorithms extension (intel#1079)
  [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188)
  [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207)
  [SYCL][CUDA] Fix context creation property parsing
  [CUDA][PI] clang-format pi.h
  [SYCL][CUDA] Handle the case of not having any CUDA device (intel#1212)
  [SYCL] Fix check-sycl-deploy target problems (intel#1165)
  [SYCL] Disable tests which take more than 5 minutes (intel#1220)
  [SYCL] Make context constructors explicit to avoid unintended conversions (intel#1219)
  [SYCL][NFC] Add clang-format configuration file for SYCL LIT tests (intel#1224)
  [SYCL] Fix command cleanup invoked from multiple threads (intel#1214)
  ...
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Mar 5, 2020
…_accessor_refactor

* origin/sycl: (38 commits)
  [SYCL] Fix device::get_devices() with a non-host device type (intel#1235)
  [SYCL][PI][CUDA] Implement kernel and kernel-group information queries (intel#1180)
  [SYCL] Remove default error code value in exception (intel#1150)
  [SYCL] Fix devicelib assert LIT test (intel#1245)
  [SYCL] Set aux-target-cpu for SYCL offload device compilation (intel#1225)
  [SYCL] Remove fabs and ceil from the list of unsupported math functions (intel#1217)
  [SYCL] Fix circular reference between events and queues (intel#1226)
  [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232)
  [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174)
  [SYCL] allow underscore symbol in temporary directory name
  [SYCL] Reject zero length arrays (intel#1153)
  [SYCL] Fix static code analyzis concerns (intel#1189)
  [SYCL] Add more details about the -fintelfpga option (intel#1218)
  [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223)
  [SYCL] Reverse max work-group size order (intel#1177)
  [SYCL][Doc] Add GroupAlgorithms extension (intel#1079)
  [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188)
  [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207)
  [SYCL][CUDA] Fix context creation property parsing
  [CUDA][PI] clang-format pi.h
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants