-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][ESIMD] Pass to replace simd* parameters with native llvm vectors. #2097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pass is needed for the ESIMD backend to generate correct code. Author: Pratik Ashar <[email protected]> Signed-off-by: Konstantin S Bobrovsky <[email protected]>
@@ -9,6 +9,6 @@ __attribute__((opencl_private)) __attribute__((register_num(17))) int vc; | |||
|
|||
SYCL_EXTERNAL void init_vc(int x) { | |||
vc = x; | |||
// CHECK: store i32 %0, i32* @vc | |||
// CHECK: store i32 %x, i32* @vc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, registers are not named if assertions are disabled.
I suggest using FileCheck pattern matching variables to make it work in both modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbobrovs - this is a test for Gang's global variable GRF binding. i dont think this test validates the pass under review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is unrelated change - fixed.
@@ -0,0 +1,430 @@ | |||
//===-- ESIMDVecArgPass.cpp - lower Close To Metal (CM) constructs --------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a regression test for this pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've some tests that use this functionality. @kbobrovs - are we also planning on checking in esimd unit tests? we've subroutine.cpp that exercises this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pratikashar, there should be a test similar to llvm/test/SYCLLowerIR/esimd_lower_intrins.ll which does not depend on anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbobrovs - added llvm/test/SYCLLowerIR/esimd_subroutine.ll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pratikashar, there are only few CHECKs, and I think the test could be made much simpler. E.g. _ZN4simdIiLi16EEC2ERS0_
and a call to it removed, _Z3fooi reduced. @bader - are you OK with the test?
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// Change in function parameter type from simd* to native llvm vector type for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this isn't done by clang's AST/CodeGen library?
Or even by using vector extension in the SYCL headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it legal for clang's CG to change type of parameter from simd* to say *?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simd* to say *?
simd * to say < i16x32 >*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're referring to attribute((ext_vector_type(N)) then we do use this to declare member of class simd. the sole data member of class simd is of vector type with this attribute applied. simd class has several other methods implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it legal for clang's CG to change type of parameter from simd* to say *?
If it's legal for LLVM pass it should be legal for any other compiler component. According to my understanding changing type at CodeGen should much easier to maintain than in LLVM pass. CodeGen lowers AST types into LLVM types, so it "changes" type already. According to my understanding changing LLVM type by pass is much more difficult to do.
if you're referring to attribute((ext_vector_type(N)) then we do use this to declare member of class simd. the sole data member of class simd is of vector type with this attribute applied. simd class has several other methods implemented.
Why do we expect llvm.genx.vload.v16i32.p4v16i32
intrinsic to accept simd class as a parameter instead of vector type data member? Could you point to the SYCL headers source code where this intrinsic is used, please? I'd like to check if it's possible to handle the problem in headers instead of LLVM pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lowering is required by the backend. More work done in offline compilation means less time spent in online compilation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it requires any additional work impacting "compilation time".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not do this here, we have to do the same transformation in the backend, that is extra online compilation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you decide which transformation should be done offline or online?
I still believe that doing this transformation 'here' is bad design.
Alternative option:
- user change the source code to use
vector
type instead ofsimd
to get better performance. (I think comments are not quite accurate and the code with simd is correct, but back-end is not able to optimize it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting user from using simd type in function declaration will make the ESIMD extension less usable. Hence not a good choice to us. The alternative of this pass is to change all SIMD type to vector type during clang CodeGen (somewhere in this long review, I see you indicated that is doable and better). You are the clang expert
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
@andykaylor, I'm not sure if proposed change is aligned with the LLVM compiler architecture design. Could you clarify if it's okay from you POV, please? |
@andykaylor, friendly ping |
I apologize for the delayed response. I hadn't seen the notification that my input was requested. I do have a lot to say about this, and I'm not sure it directly addresses Alexey's concerns, so please follow up as needed. Let me start by saying that there is a long-standing plan for pointers to become typeless in LLVM IR. Much work has been done toward this, and though much more needs to be done that will eventually be the state of the LLVM IR language. More to the point, already in LLVM IR the type associated with pointers has no semantic meaning. The existence of functions like PointerType::getElementType() may lead you to believe that the type has semantic meaning, but it does not. If we have transformations that rely on knowing the types of pointer arguments, that is a problem. Any information that can't be deduced from the way in which memory is accessed should not be relied upon. Having said that, the above statement is good news for the change under review here, though not necessarily the way it's currently implemented. As I understand it, the effect of this patch is to replace the type %class._ZTSN2cm3gen4simdIiLi16EEE.cm::gen::simd with <16 x i32> throughout the Module. The patch works downward from function arguments and global variables, but that's essentially the end result, right? Do we need to do a similar replacement with alloca values? Any correspondance between C++ types and similar types in the LLVM IR type system is nothing more than an implementation detail. There is no requirement for these types to be kept connected. The clang front end creates LLVM IR types as needed to perform a literal translation of the source code, which will describe the program semantics. The back end is then free to transform the IR in any way that maintains the same semantics. The front end probably does need to keep the AST-level type information until it generates IR, so we probably don't want to change the type there. With regard to pointer values once we generate LLVM IR, the following types are all exactly equivalent:
They are all pointers, and one of the fundamental guiding principles of LLVM IR is that memory does not have a type. So, what you're doing here is perfectly legal by LLVM IR rules. As I said above, I do have concerns about the fact that we are relying on the types of pointers for certain transformations, but I'll put that aside for a moment and assume that perhaps the dependency isn't strictly as I've stated it and even if it is we can probably find another way to implement these transformations that removes the dependency. The other thing I'd like to comment on, then, is the way this is implemented. You're starting from function arguments and global variables and then working down from there to update their uses. That will work fine for now, but when pointers become typeless it won't work. Global variables will, I think, continue to have a type associated with them in some way. Function arguments will not. I think it might be better to structure the code now to examine the uses of arguments and global variables. Consider this function:
With typeless pointers, that would look something like this:
Currently the element type of the pointer argument matches the source element type operand of the GEP instruction, but the source element type operand only exists because that won't always be true. The same is true of the type operand of the load and store instructions. In fact, there is nothing stopping someone from transforming the first IR into this today:
or even this:
My point is that the way the arguments are used gives you semantic information about what they point to. The pointer types do not. |
; CHECK: define spir_func i32 {{.+}}bar{{.+}}(<16 x i32>* {{.+}} | ||
define spir_func i32 @_Z3bar4simdIiLi16EE(%class._ZTS4simdIiLi16EE.simd* %v) #0 { | ||
entry: | ||
; CHECK: {{.+}} = bitcast <16 x i32>* {{.+}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a check of the updated function signature and a body of the function that accesses the arguments in some way. The uses of the updated arguments/globals are probably the most fragile part of this transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. i've updated esimd-subroutine.ll with more CHECK lines. these check whether simd* was converted to <16 x i32>. it also verifies that bitcasts are inserted and they use the right function argument. another CHECK statement verifies that dst of bitcast is used in the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1 | ||
|
||
; Function Attrs: norecurse nounwind | ||
define linkonce_odr spir_func void @_ZN4simdIiLi16EEC2ERS0_(%class._ZTS4simdIiLi16EE.simd addrspace(4)* %this, %class._ZTS4simdIiLi16EE.simd addrspace(4)* align 64 dereferenceable(64) %other) unnamed_addr #0 comdat align 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some checks to verify that this function is updated correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. please see above comment.
|
||
// Return ptr to first-class vector type if Value is a simd*, else return | ||
// nullptr. | ||
Type *ESIMDLowerVecArgPass::argIsSimdPtr(Value *arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: The name of this function is inconsistent with what it does. The name implies a Boolean return value (and I see it is used that way in one place) but since it returns a type I would expect a name like getSimdArgPtrTyOrNull().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Type *ESIMDLowerVecArgPass::argIsSimdPtr(Value *arg) { | ||
auto ArgType = arg->getType(); | ||
if (ArgType->isPointerTy()) { | ||
auto containedType = ArgType->getPointerElementType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: These three lines could be rewritten as:
if (auto *ArgType = dyn_cast<PointerType*>(arg->getType())) {
auto *ContainedType = ArgType->getElementType()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
||
if (isa<ConstantExpr>(V)) { | ||
auto FirstOpnd = cast<ConstantExpr>(V)->getOperand(0); | ||
return hasGlobalConstExpr(FirstOpnd, Global); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would Value::stripPointerCasts() or one of its variants achieve the same effect without recursion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i tried it and it worked. thanks for the tip. but eventually i did away with this function in latest patch.
// all such instances and replaces them with a new ConstantExpr | ||
// consisting of new global vector* variable. | ||
void ESIMDLowerVecArgPass::replaceConstExprWithGlobals(Module &M) { | ||
for (auto &F : M) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very slow way of finding global variables in const expressions. How about using for (auto &GV : M.globals())
and then iterate over uses of the global variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
if (NewTy && !G.user_empty()) { | ||
// Peel off ptr type that argIsSimdPtr applies | ||
NewTy = NewTy->getPointerElementType(); | ||
auto ZeroInit = new APInt(32, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're assuming that the element type is i32, I think. The rest of the code has a general approach to the contained vector type. I think ConstantAggregateZero::get(NewTy) is what you want here.
Shouldn't you also be checking to see what kind of initializer was used for the original variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
i dont think static initialization is supported for on device code.
auto ZeroInit = new APInt(32, 0); | ||
auto NewGlobalVar = | ||
new GlobalVariable(NewTy, G.isConstant(), G.getLinkage(), | ||
Constant::getIntegerValue(NewTy, *ZeroInit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing thread local mode and address space from things to be copied here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
SmallVector<DIGlobalVariableExpression *, 5> GVs; | ||
G.getDebugInfo(GVs); | ||
for (auto md : GVs) { | ||
NewGlobalVar->addDebugInfo(md); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure there are no other types of metadata you can copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. now i invoke copyMetadata() function to copy all MD nodes.
d5582c0
to
2163585
Compare
If I understand it correctly GPU back-end will generate more efficient code if we replace
Looking for C++ mangled type name
There are a lot C++ semantics, which this pass might need to handle in order to enable efficient code generation. |
As I understand Andy, this equivalence exists only in some cases like formal parameter types. Instructions like GEP, alloca do need accurate type.
This pass actually does not do that. It detects the simd object via its structural representation: Type *ESIMDLowerVecArgPass::argIsSimdPtr(Value *arg) {
auto ArgType = arg->getType();
if (ArgType->isPointerTy()) {
auto containedType = ArgType->getPointerElementType();
if (containedType->isStructTy()) {
if (containedType->getStructNumElements() == 1 &&
containedType->getStructElementType(0)->isVectorTy()) {
return PointerType::get(containedType->getStructElementType(0),
ArgType->getPointerAddressSpace());
}
}
}
return nullptr;
} |
0aaf9be
to
f825c4d
Compare
@andykaylor, please take another look. I suggest that we commit this variant based on formal parameter type analysis, then file an issue recording your suggestion to re-implement this based on usage, plus listing couple other alternatives. Does this sound good? |
|
||
$"_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE4Test" = comdat any | ||
|
||
; CHECK: [[NEWGLOBAL:[@a-zA-Z0-9_]*]] = dso_local global {{.+}} align 64 #0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not checking the type and the initializer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
%cmp.i = icmp eq i64 %add.i.i.i.i.i, 0 | ||
%..i = select i1 %cmp.i, i64 %add.i5.i.i.i.i, i64 %add.i.i.i.i.i | ||
%conv9.i = trunc i64 %..i to i32 | ||
; CHECK: store <16 x i32> {{.+}} bitcast (<16 x i32>* [[NEWGLOBAL]] to {{.+}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like some interesting things are missing from the check here. For instance, the original store contains an addressspace cast. What happens to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated test to check for addrspacecasts
%1 = addrspacecast %"class._ZTSN2cl4sycl5intel3gpu4simdIiLi16EEE.cl::sycl::intel::gpu::simd"* %agg.tmp to %"class._ZTSN2cl4sycl5intel3gpu4simdIiLi16EEE.cl::sycl::intel::gpu::simd" addrspace(4)* | ||
%M_data.i.i = getelementptr inbounds %"class._ZTSN2cl4sycl5intel3gpu4simdIiLi16EEE.cl::sycl::intel::gpu::simd", %"class._ZTSN2cl4sycl5intel3gpu4simdIiLi16EEE.cl::sycl::intel::gpu::simd" addrspace(4)* %0, i64 0, i32 0 | ||
%call.esimd.i.i = call <16 x i32> @llvm.genx.vload.v16i32.p4v16i32(<16 x i32> addrspace(4)* %M_data.i.i), !noalias !17 | ||
; CHECK: {{.+}} = call <16 x i32> {{.+}} [[NEWGLOBAL]] to {{.+}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd be interested in the details here.
// F may have multiple arguments of type simd*. This | ||
// function updates all parameters along with call | ||
// call sites of F. | ||
Function *ESIMDLowerVecArgPass::rewriteFunc(Function &F) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of details that need to be handled to do this correctly, so I think it would be more maintainable to use the existing clone functions.
// Use must be a call site | ||
SmallVector<Value *, 10> Params; | ||
auto User = use.getUser(); | ||
if (isa<CallInst>(User)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true.
auto User = use.getUser(); | ||
if (isa<CallInst>(User)) { | ||
auto Call = cast<CallInst>(User); | ||
for (unsigned int I = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could assert here that the function is not variadic then?
Yes, I think that's OK. It sounds like I need to speak with the IGC developers about dependence on pointer types. It should be fine for the near future as it is. |
Signed-off-by: Ashar, Pratik J <[email protected]>
f825c4d
to
ac3a4af
Compare
@andykaylor, could you approve the PR, if you are okay to merge the updated version of this PR, please? |
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
@pratikashar, to speedup review, I addressed comments from @AlexeySachkov (in #2134) - please check if you are OK with the changes. |
My comments from #2134 weren't critical, so, I'm okay to merge this patch as-is. I will most likely have some other questions after more careful review, but we can address those later |
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…rogram * upstream/sycl: (609 commits) [SYCL] Fix fail in the post commit testing (intel#2210) [SYCL] Materialize shadow local variables for byval arguments before use (intel#2200) [SYCL] Support lambda functions passed to reduction (intel#2190) [SYCL][USM] Improve USM Allocator. (intel#2026) [SYCL] Disallow mutable lambdas (intel#1785) [SYCL][ESIMD] Setup compilation pipeline for ESIMD (intel#2134) [SYCL] Fix not found kernel due to empty kernel name when using set_arg(s) (intel#2181) [SYCL] Fixed check for set_arg (intel#2203) Refactor indirect access calls to minimize invocations. (intel#2185) [SYCL][NFC] Fix potential null-pointer access (intel#2197) [SYCL] Propagate attributes from transitive calls to kernel (intel#1878) [SYCL] Fix warnings from static analysis tool (intel#2193) [SYCL][NFC] Fix ac_float test for compilation with FE optimizations (intel#2184) [GitHub Actions] Uplift clang-format version to 10 (intel#2194) [SYCL][ESIMD] Pass to replace simd* parameters with native llvm vectors. (intel#2097) [SYCL][NFC] Fixed SYCL_PI_TRACE output while selecting a device. (intel#2192) [SYCL][FPGA] New spec for controlling load-store units in FPGAs (intel#2158) [SYCL][Doc] Clarify reqd_sub_group_size (intel#2103) [SYCL] Remove noreturn function attribute (intel#2165) [SYCL] Aligned set_arg behaviour with SYCL specification (intel#2159) ...
This pass is needed for the ESIMD backend to generate correct code.
Author: Pratik Ashar [email protected]
Signed-off-by: Konstantin S Bobrovsky [email protected]
@pratikashar, please pick up the review