Skip to content

[SYCL][CUDA] Add IPSCCP pass to O0 by default #5900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 31, 2022

Conversation

JackAKirk
Copy link
Contributor

The IPSCCP pass can set branch conditions to ConstInt and swap conditional branches to unconditional branches.
This is necessary at O0 in the nvptx backend in cases where the nvvm_reflect function is used: after the nvvm-reflect pass is called, dead branches containing unused instructions aimed at a different architecture generation (SM version) to the one compiled for can remain.

A solution only targeting branches that are using the nvvm_reflect function was initially explored by adding a patch to the existing nvvm-reflect pass. This solution would require considering several cases and was abandoned in favour of a simple comprehensive solution of simply adding the IPSCCP pass to OO: since after discussions it turned out that other backends face a corresponding issue, it was decided that a simple temporary DPC++ solution is favoured and that later on in the year a permanent general solution will be worked on.

New backend flag use-ipsccp-nvptx-O0 can remove the IPSCCP pass from O0 when set false, at the users discretion.

new flag `use-ipsccp-nvptx-O0` can remove the IPSCCP pass from O0 when set false.
@JackAKirk JackAKirk requested review from a team as code owners March 28, 2022 12:55
@JackAKirk JackAKirk changed the title [SYCL][CUDA] Added IPSCCP pass to O0 by default. [SYCL][CUDA] Added IPSCCP pass to O0 by default Mar 28, 2022
@bader bader merged commit 537e51b into intel:sycl Mar 31, 2022
@bader bader changed the title [SYCL][CUDA] Added IPSCCP pass to O0 by default [SYCL][CUDA] Add IPSCCP pass to O0 by default Mar 31, 2022
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Apr 2, 2022
* sycl: (3343 commits)
  [SYCL][L0] Disable round-robin submissions to multiple CCSs (intel#5945)
  [SYCL][CUDA] Don't link pi_cuda against libsycl (intel#5908)
  [CI] Disable -Werror by default (intel#5889)
  [BuildBot] Uplift CPU/FPGAEMU RT version to 2022.13.3.0.16 (intel#5883)
  [SYCL][CUDA][libclc] Add support for atomic fp exchange and compare exchange (intel#5937)
  [SYCL] Fix device code outlining for static local variables (intel#5915)
  [SYCL][NFC] Refactor plugin CMakeLists.txt (intel#5799)
  [SPIR-V][Doc] Add JointMatrixWorkItemLengthINTEL instruction to joint matrix extension (intel#5781)
  [SYCL] Expand device_global map and make initialization order agnostic (intel#5902)
  [SYCL][CUDA] Add IPSCCP pass to O0 by default (intel#5900)
  [ESIMD] Disable ABI changes warnings in host compiler. (intel#5931)
  [SYCL] Make properties constructor constexpr (intel#5928)
  [NFC][SYCL] Fix static analysis warning (intel#5933)
  [CODEOWNERS][NFC] Assign code owners for CI scripts (intel#5873)
  [SYCL] Store the kernel object size in the integration header (intel#5862)
  [SYCL][ESIMD] Change esimd-verifier logic for detecting valid SYCL calls (intel#5914)
  [SYCL][CUDA][DOC] GettingStartedGuide.md to recommend cuda 11.6 (intel#5917)
  [SYCL][L0] Move command list cache usage under mutex (intel#5874)
  [SYCL][FPGA] Prepare future implementation of experimental pipe properties (intel#5886)
  [CI] Roll back intel driver to the latest version (intel#5925)
  ...
bader pushed a commit that referenced this pull request Apr 6, 2022
#5921)

The libclc remangler handles function overloads with e.g. `long long` `long` and `int`, ensuring consistency with OpenCL C primitives. Previously, this was achieved by creating a `GlobalAlias` for each of the various overloads. However, the NVPTX target does not work with function aliases. Normally, an optimization pass removes these aliases, but the present approach prevents compiling with DPC++ for CUDA with `-O0`.

This PR changes the behaviour of the remangler to emit function clones (a copy of the function with a different name). There is a risk that this bloats the compiled code, but optimization should remove unneeded clones, as it did with unneeded aliases.

There is an additional barrier to `-O0` compilation for NVPTX relating to `nvvm_reflect`, addressed here:  #5900

**Note:** this PR is best reviewed as separate commits. The first commit makes the (small) functional change. The second commit is simply renaming all 'Alias*' variables to 'Clone*'.
pvchupin pushed a commit to pvchupin/llvm that referenced this pull request May 7, 2022
Return back additional switch for test, that was introduced in intel#5900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants