Skip to content

[SYCL][ESIMD] Pass to replace simd* parameters with native llvm vectors. #2097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 28, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/include/llvm/InitializePasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ void initializeStructurizeCFGPass(PassRegistry&);
void initializeSYCLLowerWGScopeLegacyPassPass(PassRegistry &);
void initializeSYCLLowerESIMDLegacyPassPass(PassRegistry &);
void initializeESIMDLowerLoadStorePass(PassRegistry &);
void initializeESIMDLowerVecArgLegacyPassPass(PassRegistry &);
void initializeTailCallElimPass(PassRegistry&);
void initializeTailDuplicatePass(PassRegistry&);
void initializeTargetLibraryInfoWrapperPassPass(PassRegistry&);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/LinkAllPasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ namespace {
(void)llvm::createSYCLLowerWGScopePass();
(void)llvm::createSYCLLowerESIMDPass();
(void)llvm::createESIMDLowerLoadStorePass();
(void)llvm::createESIMDLowerVecArgPass();
std::string buf;
llvm::raw_string_ostream os(buf);
(void) llvm::createPrintModulePass(os);
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/SYCLLowerIR/LowerESIMD.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ class ESIMDLowerLoadStorePass : public PassInfoMixin<ESIMDLowerLoadStorePass> {
FunctionPass *createESIMDLowerLoadStorePass();
void initializeESIMDLowerLoadStorePass(PassRegistry &);

ModulePass *createESIMDLowerVecArgPass();
void initializeESIMDLowerVecArgLegacyPassPass(PassRegistry &);

} // namespace llvm

#endif // LLVM_SYCLLOWERIR_LOWERESIMD_H
1 change: 1 addition & 0 deletions llvm/lib/SYCLLowerIR/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ add_llvm_component_library(LLVMSYCLLowerIR
LowerWGScope.cpp
LowerESIMD.cpp
LowerESIMDVLoadVStore.cpp
LowerESIMDVecArg.cpp

ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/SYCLLowerIR
Expand Down
320 changes: 320 additions & 0 deletions llvm/lib/SYCLLowerIR/LowerESIMDVecArg.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
//===-- ESIMDVecArgPass.cpp - lower Close To Metal (CM) constructs --------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a regression test for this pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've some tests that use this functionality. @kbobrovs - are we also planning on checking in esimd unit tests? we've subroutine.cpp that exercises this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratikashar, there should be a test similar to llvm/test/SYCLLowerIR/esimd_lower_intrins.ll which does not depend on anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbobrovs - added llvm/test/SYCLLowerIR/esimd_subroutine.ll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratikashar, there are only few CHECKs, and I think the test could be made much simpler. E.g. _ZN4simdIiLi16EEC2ERS0_ and a call to it removed, _Z3fooi reduced. @bader - are you OK with the test?

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// Change in function parameter type from simd* to native llvm vector type for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this isn't done by clang's AST/CodeGen library?
Or even by using vector extension in the SYCL headers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it legal for clang's CG to change type of parameter from simd* to say *?

Copy link
Contributor

@pratikashar pratikashar Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd* to say *?

simd * to say < i16x32 >*?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring to attribute((ext_vector_type(N)) then we do use this to declare member of class simd. the sole data member of class simd is of vector type with this attribute applied. simd class has several other methods implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it legal for clang's CG to change type of parameter from simd* to say *?

If it's legal for LLVM pass it should be legal for any other compiler component. According to my understanding changing type at CodeGen should much easier to maintain than in LLVM pass. CodeGen lowers AST types into LLVM types, so it "changes" type already. According to my understanding changing LLVM type by pass is much more difficult to do.

if you're referring to attribute((ext_vector_type(N)) then we do use this to declare member of class simd. the sole data member of class simd is of vector type with this attribute applied. simd class has several other methods implemented.

Why do we expect llvm.genx.vload.v16i32.p4v16i32 intrinsic to accept simd class as a parameter instead of vector type data member? Could you point to the SYCL headers source code where this intrinsic is used, please? I'd like to check if it's possible to handle the problem in headers instead of LLVM pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lowering is required by the backend. More work done in offline compilation means less time spent in online compilation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it requires any additional work impacting "compilation time".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not do this here, we have to do the same transformation in the backend, that is extra online compilation time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you decide which transformation should be done offline or online?

I still believe that doing this transformation 'here' is bad design.
Alternative option:

  • user change the source code to use vector type instead of simd to get better performance. (I think comments are not quite accurate and the code with simd is correct, but back-end is not able to optimize it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restricting user from using simd type in function declaration will make the ESIMD extension less usable. Hence not a good choice to us. The alternative of this pass is to change all SIMD type to vector type during clang CodeGen (somewhere in this long review, I see you indicated that is doable and better). You are the clang expert

// cmc compiler to generate correct code for subroutine parameter passing and
// globals:
//
// Old IR:
// ======
//
// Parameter %0 is of type simd*
// define dso_local spir_func void @_Z3fooPiN2cm3gen4simdIiLi16EEE(i32
// addrspace(4)* %C,
// "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" * %0)
// local_unnamed_addr #2 {
//
// New IR:
// ======
//
// Translate simd* parameter (#1) to vector <16 x 32>* type and insert bitcast.
// All users of old parameter will use result of the bitcast.
//
// define dso_local spir_func void @_Z3fooPiN2cm3gen4simdIiLi16EEE(i32
// addrspace(4)* %C,
// <16 x i32>* %0) local_unnamed_addr #2 {
// entry:
// % 1 = bitcast<16 x i32> * % 0 to %
// "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" *
//
//
// Change in global variables:
//
// Old IR:
// ======
// @vc = global %"class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd"
// zeroinitializer, align 64 #0
//
// % call.cm.i.i = tail call<16 x i32> @llvm.genx.vload.v16i32.p4v16i32(
// <16 x i32> addrspace(4) * getelementptr(
// % "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd",
// % "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" addrspace(4) *
// addrspacecast(% "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" * @vc to
// % "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" addrspace(4) *), i64 0,
// i32 0))
//
// New IR:
// ======
//
// @0 = dso_local global <16 x i32> zeroinitializer, align 64 #0 <-- New Global
// Variable
//
// % call.cm.i.i = tail call<16 x i32> @llvm.genx.vload.v16i32.p4v16i32(
// <16 x i32> addrspace(4) * getelementptr(
// % "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd",
// % "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" addrspace(4) *
// addrspacecast(% "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" *
// bitcast(<16 x i32> * @0 to
// %"class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" *) to %
// "class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd" addrspace(4) *),
// i64 0, i32 0))
//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"

using namespace llvm;

#define DEBUG_TYPE "ESIMDLowerVecArg"

namespace llvm {

// Forward declarations
void initializeESIMDLowerVecArgLegacyPassPass(PassRegistry &);
ModulePass *createESIMDLowerVecArgPass();

// Pass converts simd* function parameters and globals to
// llvm's first-class vector* type.
class ESIMDLowerVecArgPass {
public:
bool run(Module &M);

private:
DenseMap<GlobalVariable *, GlobalVariable *> OldNewGlobal;

Function *rewriteFunc(Function &F);
Type *getSimdArgPtrTyOrNull(Value *arg);
void fixGlobals(Module &M);
void replaceConstExprWithGlobals(Module &M);
ConstantExpr *createNewConstantExpr(GlobalVariable *newGlobalVar,
Type *oldGlobalType, Value *old);
void removeOldGlobals();
};

} // namespace llvm

namespace {
class ESIMDLowerVecArgLegacyPass : public ModulePass {
public:
static char ID;
ESIMDLowerVecArgLegacyPass() : ModulePass(ID) {
initializeESIMDLowerVecArgLegacyPassPass(*PassRegistry::getPassRegistry());
}

bool runOnModule(Module &M) override {
auto Modified = Impl.run(M);
return Modified;
}

bool doInitialization(Module &M) override { return false; }

private:
ESIMDLowerVecArgPass Impl;
};
} // namespace

char ESIMDLowerVecArgLegacyPass::ID = 0;
INITIALIZE_PASS(ESIMDLowerVecArgLegacyPass, "ESIMDLowerVecArg",
"Translate simd ptr to native vector type", false, false)

// Public interface to VecArgPass
ModulePass *llvm::createESIMDLowerVecArgPass() {
return new ESIMDLowerVecArgLegacyPass();
}

// Return ptr to first-class vector type if Value is a simd*, else return
// nullptr.
Type *ESIMDLowerVecArgPass::getSimdArgPtrTyOrNull(Value *arg) {
auto ArgType = dyn_cast<PointerType>(arg->getType());
if (!ArgType || !ArgType->getElementType()->isStructTy())
return nullptr;
auto ContainedType = ArgType->getElementType();
if ((ContainedType->getStructNumElements() != 1) ||
!ContainedType->getStructElementType(0)->isVectorTy())
return nullptr;
return PointerType::get(ContainedType->getStructElementType(0),
ArgType->getPointerAddressSpace());
}

// F may have multiple arguments of type simd*. This
// function updates all parameters along with call
// call sites of F.
Function *ESIMDLowerVecArgPass::rewriteFunc(Function &F) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider llvm::cloneFunctionInto()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing me to this, i wasnt aware of this function. i looked in to this now and it seems this does a deep copy. since we dont want to store old function, i think we can skip using this API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of details that need to be handled to do this correctly, so I think it would be more maintainable to use the existing clone functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated patch to use llvm's CloneFunctionInto API.

FunctionType *FTy = F.getFunctionType();
Type *RetTy = FTy->getReturnType();
SmallVector<Type *, 8> ArgTys;

for (unsigned int i = 0; i != F.arg_size(); i++) {
auto Arg = F.getArg(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: LLVM coding standards suggest using "auto *" when the deduced type is a pointer.

Type *NewTy = getSimdArgPtrTyOrNull(Arg);
if (NewTy) {
// Copy over byval type for simd* type
ArgTys.push_back(NewTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not need to copy attributes for transformed simd* types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. now the loop copies attributes from all args.

} else {
// Transfer all non-simd ptr arguments
ArgTys.push_back(Arg->getType());
}
}

FunctionType *NFTy = FunctionType::get(RetTy, ArgTys, false);

// Create new function body and insert into the module
Function *NF = Function::Create(NFTy, F.getLinkage(), F.getName());
F.getParent()->getFunctionList().insert(F.getIterator(), NF);

SmallVector<ReturnInst *, 8> Returns;
SmallVector<BitCastInst *, 8> BitCasts;
ValueToValueMapTy VMap;
for (unsigned int I = 0; I != F.arg_size(); I++) {
auto Arg = F.getArg(I);
Type *newTy = getSimdArgPtrTyOrNull(Arg);
if (newTy) {
// bitcast vector* -> simd*
auto BitCast = new BitCastInst(NF->getArg(I), Arg->getType());
BitCasts.push_back(BitCast);
VMap.insert(std::make_pair(Arg, BitCast));
continue;
}
VMap.insert(std::make_pair(Arg, NF->getArg(I)));
}

llvm::CloneFunctionInto(NF, &F, VMap, F.getSubprogram() != nullptr, Returns);

for (auto &B : BitCasts) {
NF->begin()->getInstList().push_front(B);
}

NF->takeName(&F);

// Fix call sites
SmallVector<std::pair<Instruction *, Instruction *>, 10> OldNewInst;
for (auto &use : F.uses()) {
// Use must be a call site
SmallVector<Value *, 10> Params;
auto Call = cast<CallInst>(use.getUser());
// Variadic functions not supported
assert(!Call->getFunction()->isVarArg() &&
"Variadic functions not supported");
for (unsigned int I = 0; I < Call->getNumArgOperands(); I++) {
auto SrcOpnd = Call->getOperand(I);
auto NewTy = getSimdArgPtrTyOrNull(SrcOpnd);
if (NewTy) {
auto BitCast = new BitCastInst(SrcOpnd, NewTy, "", Call);
Params.push_back(BitCast);
} else {
if (SrcOpnd != &F)
Params.push_back(SrcOpnd);
else
Params.push_back(NF);
}
}
// create new call instruction
auto NewCallInst = CallInst::Create(NFTy, NF, Params, "");
NewCallInst->setCallingConv(F.getCallingConv());
OldNewInst.push_back(std::make_pair(Call, NewCallInst));
}

for (auto InstPair : OldNewInst) {
auto OldInst = InstPair.first;
auto NewInst = InstPair.second;
ReplaceInstWithInst(OldInst, NewInst);
}

F.eraseFromParent();

return NF;
}

// Replace ConstantExpr if it contains old global variable.
ConstantExpr *
ESIMDLowerVecArgPass::createNewConstantExpr(GlobalVariable *NewGlobalVar,
Type *OldGlobalType, Value *Old) {
ConstantExpr *NewConstantExpr = nullptr;

if (isa<GlobalVariable>(Old)) {
NewConstantExpr = cast<ConstantExpr>(
ConstantExpr::getBitCast(NewGlobalVar, OldGlobalType));
return NewConstantExpr;
}

auto InnerMost = createNewConstantExpr(
NewGlobalVar, OldGlobalType, cast<ConstantExpr>(Old)->getOperand(0));

NewConstantExpr = cast<ConstantExpr>(
cast<ConstantExpr>(Old)->getWithOperandReplaced(0, InnerMost));

return NewConstantExpr;
}

// Globals are part of ConstantExpr. This loop iterates over
// all such instances and replaces them with a new ConstantExpr
// consisting of new global vector* variable.
void ESIMDLowerVecArgPass::replaceConstExprWithGlobals(Module &M) {
for (auto &GlobalVars : OldNewGlobal) {
auto &G = *GlobalVars.first;
for (auto UseOfG : G.users()) {
auto NewGlobal = GlobalVars.second;
auto NewConstExpr = createNewConstantExpr(NewGlobal, G.getType(), UseOfG);
UseOfG->replaceAllUsesWith(NewConstExpr);
}
}
}

// This function creates new global variables of type vector* type
// when old one is of simd* type.
void ESIMDLowerVecArgPass::fixGlobals(Module &M) {
for (auto &G : M.getGlobalList()) {
auto NewTy = getSimdArgPtrTyOrNull(&G);
if (NewTy && !G.user_empty()) {
// Peel off ptr type that getSimdArgPtrTyOrNull applies
NewTy = NewTy->getPointerElementType();
auto ZeroInit = ConstantAggregateZero::get(NewTy);
auto NewGlobalVar =
new GlobalVariable(NewTy, G.isConstant(), G.getLinkage(), ZeroInit,
"", G.getThreadLocalMode(), G.getAddressSpace());
NewGlobalVar->setExternallyInitialized(G.isExternallyInitialized());
NewGlobalVar->copyAttributesFrom(&G);
NewGlobalVar->takeName(&G);
NewGlobalVar->copyMetadata(&G, 0);
M.getGlobalList().push_back(NewGlobalVar);
OldNewGlobal.insert(std::make_pair(&G, NewGlobalVar));
}
}

replaceConstExprWithGlobals(M);

removeOldGlobals();
}

// Remove old global variables from the program.
void ESIMDLowerVecArgPass::removeOldGlobals() {
for (auto &G : OldNewGlobal) {
G.first->removeDeadConstantUsers();
G.first->eraseFromParent();
}
}

bool ESIMDLowerVecArgPass::run(Module &M) {
fixGlobals(M);

SmallVector<Function *, 10> functions;
for (auto &F : M) {
functions.push_back(&F);
}

for (auto F : functions) {
for (unsigned int I = 0; I != F->arg_size(); I++) {
auto Arg = F->getArg(I);
if (getSimdArgPtrTyOrNull(Arg)) {
rewriteFunc(*F);
break;
}
}
}

return true;
}
Loading