Skip to content

[SYCL][CUDA] Add IPSCCP pass to O0 by default #5900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"
Expand Down Expand Up @@ -63,6 +64,12 @@ static cl::opt<bool> UseShortPointersOpt(
"Use 32-bit pointers for accessing const/local/shared address spaces."),
cl::init(false), cl::Hidden);

static cl::opt<bool>
UseIPSCCPO0("use-ipsccp-nvptx-O0",
cl::desc("Use IPSCCP pass at O0 as a temp solution for "
"nvvm-reflect dead-code errors."),
cl::init(true), cl::Hidden);

namespace llvm {

void initializeLocalAccessorToSharedMemoryPass(PassRegistry &);
Expand Down Expand Up @@ -327,6 +334,10 @@ void NVPTXPassConfig::addIRPasses() {
const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();
addPass(createNVVMReflectPass(ST.getSmVersion()));

if (getOptLevel() == CodeGenOpt::None && UseIPSCCPO0) {
addPass(createIPSCCPPass());
}

// FIXME: should the target triple check be done by the pass itself?
// See createNVPTXLowerArgsPass as an example
if (getTM<NVPTXTargetMachine>().getTargetTriple().getOS() == Triple::CUDA) {
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/NVPTX/param-load-store.ll
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
; Verifies correctness of load/store of parameters and return values.
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 -O0 -verify-machineinstrs | FileCheck -allow-deprecated-dag-overlap %s
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 -O0 -verify-machineinstrs -use-ipsccp-nvptx-O0=false | FileCheck -allow-deprecated-dag-overlap %s

%s_i1 = type { i1 }
%s_i8 = type { i8 }
Expand Down
12 changes: 12 additions & 0 deletions sycl/doc/GetStartedGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,18 @@ which contains all the symbols required.
significantly slower but matches the default precision used by `nvcc`, and
this `clang++` flag is equivalent to the `nvcc` `-prec-sqrt` flag, except that
it defaults to `false`.
* No Opt (O0) uses the IPSCCP compiler pass by default, although the IPSCCP pass
can be switched off at O0 using the `-mllvm -use-ipsccp-nvptx-O0=false` flag at
the user's discretion.
The reason that the IPSCCP pass is used by default even at O0 is that there is
currently an unresolved issue with the nvvm-reflect compiler pass: This pass is
used to pick the correct branches depending on the SM version which can be
optionally specified by the `--cuda-gpu-arch` flag.
If the arch flag is not specified by the user, the default value, SM 50, is used.
Without the execution of the IPSCCP pass at -O0 when using a low SM version,
dead instructions which require a higher SM version can remain. Since
corresponding issues occur in other backends future work will aim for a
universal solution to these issues.

### HIP back-end limitations

Expand Down