-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][ESIMD] Setup compilation pipeline for ESIMD #2134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
abd403f
[SYCL][ESIMD] Enable LLVM passes for ESIMD.
kbobrovs 3c649de
[SYCL][ESIMD] Customize ESIMD optimization pipeline.
kbobrovs 380a344
[SYCL][ESIMD] Fix SPIRV intrinsics translation.
kbobrovs 59242b6
[SYCL][ESIMD] Add ESIMD intrinsics/IR translation tests.
kbobrovs 8b18175
[SYCL][ESIMD] Add ESIMD tests executed on host.
kbobrovs e076a8a
[SQUASH] Applied clang-format
kbobrovs af1d73b
Update sycl/test/esimd/glob.cpp
kbobrovs 2ef6447
[SYCL][ESIMD] Fix esimd-private-global.cpp test.
kbobrovs 1fbcad5
[SQUASH] Fixed esimd_metadata2.cpp to prevent inlining.
kbobrovs abdc4a1
[SQUASH] Applied clang format to ESIMD tests.
kbobrovs a08bf0f
[SQUASH] Updated attribute names in ESIMD globals test.
kbobrovs 05a8351
[SQUASH] Move LLVM passes customization for ESIMD under CreatePasses.
kbobrovs c9ff9eb
[SQUASH] Fix code formatting.
kbobrovs 3cd2b40
Update clang/lib/CodeGen/BackendUtil.cpp
kbobrovs 17472ad
[SQUASH] Make SYCLLowerESIMDPass run regardless of optimizer options.
kbobrovs 1197ae8
[SQUASH] Remove -O0 - not needed after ESIMD opt passes move.
kbobrovs 97ce032
[SYCL][ESIMD] Match 'align' formal param attribute in esimd_subroutin…
kbobrovs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
// TODO ESIMD enable host device under -fsycl | ||
// RUN: %clangxx -I %sycl_include %s -o %t.out -lsycl | ||
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out | ||
|
||
#include <CL/sycl.hpp> | ||
#include <CL/sycl/intel/esimd.hpp> | ||
#include <iostream> | ||
|
||
using namespace cl::sycl; | ||
|
||
class ESIMDSelector : public device_selector { | ||
// Require GPU device unless HOST is requested in SYCL_DEVICE_TYPE env | ||
virtual int operator()(const device &device) const { | ||
if (const char *dev_type = getenv("SYCL_DEVICE_TYPE")) { | ||
if (!strcmp(dev_type, "GPU")) | ||
return device.is_gpu() ? 1000 : -1; | ||
if (!strcmp(dev_type, "HOST")) | ||
return device.is_host() ? 1000 : -1; | ||
std::cerr << "Supported 'SYCL_DEVICE_TYPE' env var values are 'GPU' and " | ||
"'HOST', '" | ||
<< dev_type << "' is not.\n"; | ||
return -1; | ||
} | ||
// If "SYCL_DEVICE_TYPE" not defined, only allow gpu device | ||
return device.is_gpu() ? 1000 : -1; | ||
} | ||
}; | ||
|
||
auto exception_handler = [](exception_list l) { | ||
for (auto ep : l) { | ||
try { | ||
std::rethrow_exception(ep); | ||
} catch (cl::sycl::exception &e0) { | ||
std::cout << "sycl::exception: " << e0.what() << std::endl; | ||
} catch (std::exception &e) { | ||
std::cout << "std::exception: " << e.what() << std::endl; | ||
} catch (...) { | ||
std::cout << "generic exception\n"; | ||
} | ||
} | ||
}; | ||
|
||
int main(void) { | ||
constexpr unsigned Size = 256; | ||
constexpr unsigned VL = 32; | ||
constexpr unsigned GroupSize = 2; | ||
|
||
int A[Size]; | ||
int B[Size]; | ||
int C[Size] = {}; | ||
|
||
for (unsigned i = 0; i < Size; ++i) { | ||
A[i] = B[i] = i; | ||
} | ||
|
||
{ | ||
cl::sycl::buffer<int, 1> bufA(A, Size); | ||
cl::sycl::buffer<int, 1> bufB(B, Size); | ||
cl::sycl::buffer<int, 1> bufC(C, Size); | ||
|
||
// We need that many task groups | ||
cl::sycl::range<1> GroupRange{Size / VL}; | ||
|
||
// We need that many tasks in each group | ||
cl::sycl::range<1> TaskRange{GroupSize}; | ||
|
||
cl::sycl::nd_range<1> Range{GroupRange, TaskRange}; | ||
|
||
queue q(ESIMDSelector{}, exception_handler); | ||
q.submit([&](cl::sycl::handler &cgh) { | ||
auto accA = bufA.get_access<cl::sycl::access::mode::read>(cgh); | ||
auto accB = bufB.get_access<cl::sycl::access::mode::read>(cgh); | ||
auto accC = bufC.get_access<cl::sycl::access::mode::write>(cgh); | ||
|
||
cgh.parallel_for<class Test>( | ||
Range, [=](nd_item<1> ndi) SYCL_ESIMD_KERNEL { | ||
using namespace sycl::intel::gpu; | ||
auto pA = accA.get_pointer().get(); | ||
auto pB = accB.get_pointer().get(); | ||
auto pC = accC.get_pointer().get(); | ||
|
||
int i = ndi.get_global_id(0); | ||
constexpr int ESIZE = sizeof(int); | ||
simd<uint32_t, VL> offsets(0, ESIZE); | ||
|
||
simd<int, VL> va = gather<int, VL>(pA + i * VL, offsets); | ||
simd<int, VL> vb = block_load<int, VL>(pB + i * VL); | ||
simd<int, VL> vc = va + vb; | ||
|
||
block_store<int, VL>(pC + i * VL, vc); | ||
}); | ||
}); | ||
|
||
for (unsigned i = 0; i < Size; ++i) { | ||
if (A[i] + B[i] != C[i]) { | ||
std::cout << "failed at index " << i << ", " << C[i] << " != " << A[i] | ||
<< " + " << B[i] << "\n"; | ||
return 1; | ||
} | ||
} | ||
} | ||
|
||
std::cout << "Passed\n"; | ||
return 0; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// RUN: %clangxx -fsycl -fsycl-explicit-simd -c -fsycl-device-only -Xclang -emit-llvm %s -o - | \ | ||
// RUN: FileCheck %s | ||
|
||
// This test checks that globals with register attribute are allowed in ESIMD | ||
// mode, can be accessed in functions and correct LLVM IR is generated | ||
// (including translation of the register attribute) | ||
|
||
#include <CL/sycl.hpp> | ||
#include <CL/sycl/intel/esimd.hpp> | ||
#include <iostream> | ||
|
||
using namespace cl::sycl; | ||
using namespace sycl::intel::gpu; | ||
|
||
constexpr unsigned VL = 16; | ||
|
||
ESIMD_PRIVATE ESIMD_REGISTER(17) simd<int, VL> vc; | ||
// CHECK-DAG: @vc = {{.+}} <16 x i32> zeroinitializer, align 64 #0 | ||
// CHECK-DAG: attributes #0 = { {{.*}}"VCByteOffset"="17" "VCGlobalVariable" "VCVolatile"{{.*}} } | ||
|
||
ESIMD_PRIVATE ESIMD_REGISTER(17 + VL) simd<int, VL> vc1; | ||
// CHECK-DAG: @vc1 = {{.+}} <16 x i32> zeroinitializer, align 64 #1 | ||
// CHECK-DAG: attributes #1 = { {{.*}}"VCByteOffset"="33" "VCGlobalVariable" "VCVolatile"{{.*}} } | ||
|
||
SYCL_EXTERNAL ESIMD_NOINLINE void init_vc(int x) { | ||
vc1 = vc + 1; | ||
vc = x; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
// Basic ESIMD test which checks that ESIMD invocation syntax can get compiled. | ||
// RUN: %clangxx -fsycl -fsycl-explicit-simd -fsycl-device-only -c %s -o %t.bc | ||
|
||
#include <CL/sycl.hpp> | ||
#include <CL/sycl/intel/esimd.hpp> | ||
#include <iostream> | ||
|
||
int main(void) { | ||
constexpr unsigned Size = 4; | ||
int A[Size] = {1, 2, 3, 4}; | ||
int B[Size] = {1, 2, 3, 4}; | ||
int C[Size]; | ||
|
||
{ | ||
cl::sycl::range<1> UnitRange{1}; | ||
cl::sycl::buffer<int, 1> bufA(A, UnitRange); | ||
cl::sycl::buffer<int, 1> bufB(B, UnitRange); | ||
cl::sycl::buffer<int, 1> bufC(C, UnitRange); | ||
|
||
cl::sycl::queue().submit([&](cl::sycl::handler &cgh) { | ||
auto accA = bufA.get_access<cl::sycl::access::mode::read>(cgh); | ||
auto accB = bufB.get_access<cl::sycl::access::mode::read>(cgh); | ||
auto accC = bufC.get_access<cl::sycl::access::mode::write>(cgh); | ||
|
||
cgh.parallel_for<class Test>(UnitRange * UnitRange, | ||
[=](sycl::id<1> i) SYCL_ESIMD_KERNEL { | ||
// those operations below would normally be | ||
// represented as a single vector operation | ||
// through ESIMD vector | ||
accC[i + 0] = accA[i + 0] + accB[i + 0]; | ||
accC[i + 1] = accA[i + 1] + accB[i + 1]; | ||
accC[i + 2] = accA[i + 2] + accB[i + 2]; | ||
accC[i + 3] = accA[i + 3] + accB[i + 3]; | ||
}); | ||
}); | ||
} | ||
|
||
return 0; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
// RUN: %clangxx -O0 -fsycl -fsycl-explicit-simd -fsycl-device-only -Xclang -emit-llvm %s -o - | \ | ||
// RUN: FileCheck %s | ||
|
||
// Checks ESIMD intrinsic translation. | ||
// NOTE: must be run in -O0, as optimizer optimizes away some of the code | ||
|
||
#include <CL/sycl.hpp> | ||
#include <CL/sycl/detail/image_ocl_types.hpp> | ||
#include <CL/sycl/intel/esimd.hpp> | ||
|
||
using namespace sycl::intel::gpu; | ||
|
||
ESIMD_PRIVATE vector_type_t<int, 32> vc; | ||
ESIMD_PRIVATE ESIMD_REGISTER(192) simd<int, 16> vg; | ||
|
||
SYCL_ESIMD_FUNCTION SYCL_EXTERNAL simd<float, 16> foo(); | ||
|
||
class EsimdFunctor { | ||
public: | ||
void operator()() __attribute__((sycl_explicit_simd)) { foo(); } | ||
}; | ||
|
||
template <typename name, typename Func> | ||
__attribute__((sycl_kernel)) void kernel(Func kernelFunc) { | ||
kernelFunc(); | ||
} | ||
|
||
void bar() { | ||
EsimdFunctor esimdf; | ||
kernel<class kernel_esimd>(esimdf); | ||
} | ||
|
||
SYCL_ESIMD_FUNCTION SYCL_EXTERNAL simd<float, 16> foo() { | ||
// CHECK-LABEL: @_Z3foov | ||
constexpr int VL = 32; | ||
uint32_t *ptr = 0; | ||
|
||
int x = 0, y = 0, z = 0; | ||
|
||
simd<uint32_t, VL> v1(0, x + z); | ||
simd<uint64_t, VL> offsets(0, y); | ||
simd<uintptr_t, VL> v_addr(reinterpret_cast<uintptr_t>(ptr)); | ||
simd<ushort, VL> pred; | ||
v_addr += offsets; | ||
|
||
__esimd_flat_atomic0<EsimdAtomicOpType::ATOMIC_INC, uint32_t, VL>( | ||
v_addr.data(), pred.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.svm.atomic.inc.v32i32.v32i1.v32i64(<32 x i1> %{{[0-9a-zA-Z_.]+}}, <32 x i64> %{{[0-9a-zA-Z_.]+}}, <32 x i32> undef) | ||
|
||
__esimd_flat_atomic1<EsimdAtomicOpType::ATOMIC_ADD, uint32_t, VL>( | ||
v_addr.data(), v1, pred.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.svm.atomic.add.v32i32.v32i1.v32i64(<32 x i1> %{{[0-9a-zA-Z_.]+}}, <32 x i64> %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}, <32 x i32> undef) | ||
__esimd_flat_atomic2<EsimdAtomicOpType::ATOMIC_CMPXCHG, uint32_t, VL>( | ||
v_addr.data(), v1, v1, pred.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.svm.atomic.cmpxchg.v32i32.v32i1.v32i64(<32 x i1> %{{[0-9a-zA-Z_.]+}}, <32 x i64> %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}, <32 x i32> undef) | ||
|
||
uintptr_t addr = reinterpret_cast<uintptr_t>(ptr); | ||
simd<uint32_t, VL> v00 = | ||
__esimd_flat_block_read_unaligned<uint32_t, VL>(addr); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.svm.block.ld.unaligned.v32i32(i64 %{{[0-9a-zA-Z_.]+}}) | ||
__esimd_flat_block_write<uint32_t, VL>(addr, v00.data()); | ||
// CHECK: call void @llvm.genx.svm.block.st.v32i32(i64 %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}) | ||
|
||
simd<uint32_t, VL> v01 = | ||
__esimd_flat_read<uint32_t, VL>(v_addr.data(), 0, pred.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.svm.gather.v32i32.v32i1.v32i64(<32 x i1> %{{[0-9a-zA-Z_.]+}}, i32 0, <32 x i64> %{{[0-9a-zA-Z_.]+}}, <32 x i32> undef) | ||
|
||
__esimd_flat_write<uint32_t, VL>(v_addr.data(), v01.data(), 0, pred.data()); | ||
// CHECK: call void @llvm.genx.svm.scatter.v32i1.v32i64.v32i32(<32 x i1> %{{[0-9a-zA-Z_.]+}}, i32 0, <32 x i64> %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}) | ||
|
||
simd<short, 16> mina(0, 1); | ||
simd<short, 16> minc(5); | ||
minc = __esimd_smin<short, 16>(mina.data(), minc.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <16 x i16> @llvm.genx.smin.v16i16.v16i16(<16 x i16> %{{[0-9a-zA-Z_.]+}}, <16 x i16> %{{[0-9a-zA-Z_.]+}}) | ||
|
||
simd<float, 1> diva(2.f); | ||
simd<float, 1> divb(1.f); | ||
diva = __esimd_div_ieee<1>(diva.data(), divb.data()); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <1 x float> @llvm.genx.ieee.div.v1f32(<1 x float> %{{[0-9a-zA-Z_.]+}}, <1 x float> %{{[0-9a-zA-Z_.]+}}) | ||
|
||
simd<float, 16> a(0.1f); | ||
simd<float, 8> b = __esimd_rdregion<float, 16, 8, 0, 8, 1>(a.data(), 0); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <8 x float> @llvm.genx.rdregionf.v8f32.v16f32.i16(<16 x float> %{{[0-9a-zA-Z_.]+}}, i32 0, i32 8, i32 1, i16 0, i32 0) | ||
|
||
simd<float, 16> c(0.0f); | ||
|
||
using PH = cl::sycl::access::placeholder; | ||
|
||
cl::sycl::accessor<cl::sycl::cl_int4, 2, cl::sycl::access::mode::read, | ||
cl::sycl::access::target::image, PH::false_t> | ||
pA; | ||
cl::sycl::accessor<cl::sycl::cl_int4, 2, cl::sycl::access::mode::write, | ||
cl::sycl::access::target::image, PH::false_t> | ||
pB; | ||
|
||
auto d = __esimd_wrregion<float, 16 /*ret size*/, 8 /*write size*/, | ||
0 /*vstride*/, 8 /*row width*/, 1 /*hstride*/>( | ||
c.data() /*dst*/, b.data() /*src*/, 0 /*offset*/); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <16 x float> @llvm.genx.wrregionf.v16f32.v8f32.i16.v8i1(<16 x float> %{{[0-9a-zA-Z_.]+}}, <8 x float> %{{[0-9a-zA-Z_.]+}}, i32 0, i32 8, i32 1, i16 0, i32 0, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>) | ||
|
||
simd<int, 32> va; | ||
va = media_block_load<int, 4, 8>(pA, x, y); | ||
// CHECK: %[[SI0:[0-9a-zA-Z_.]+]] = ptrtoint %opencl.image2d_ro_t addrspace(1)* %{{[0-9a-zA-Z_.]+}} to i32 | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <32 x i32> @llvm.genx.media.ld.v32i32(i32 0, i32 %[[SI0]], i32 0, i32 32, i32 %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}) | ||
|
||
simd<int, 32> vb = va + 1; | ||
media_block_store<int, 4, 8>(pB, x, y, vb); | ||
// CHECK: %[[SI2:[0-9a-zA-Z_.]+]] = ptrtoint %opencl.image2d_wo_t addrspace(1)* %{{[0-9a-zA-Z_.]+}} to i32 | ||
// CHECK: call void @llvm.genx.media.st.v32i32(i32 0, i32 %[[SI2]], i32 0, i32 32, i32 %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}, <32 x i32> %{{[0-9a-zA-Z_.]+}}) | ||
|
||
auto ee = __esimd_vload<int, 16>((vector_type_t<int, 16> *)(&vg)); | ||
// CHECK: %{{[0-9a-zA-Z_.]+}} = call <16 x i32> @llvm.genx.vload.v16i32.p4v16i32(<16 x i32> addrspace(4)* {{.*}}) | ||
__esimd_vstore<int, 32>(&vc, va.data()); | ||
// CHECK: store <32 x i32> %{{[0-9a-zA-Z_.]+}}, <32 x i32> addrspace(4)* {{.*}} | ||
|
||
return d; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, consider following existing design for lowering functions like SPIR-V built-ins to device specific intrinsics by linking corresponding device library with SPIR-V functions implementation (see NVPTX implementation for details).
This will allow you to remove most of
LowerESIMD
pass (including the buggy part).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with NVPTX approach as well as with using existing clang built-in infra is that it is hard to support the "C++" intrinsics mechanism with those. Once C++ intrinsics is supported in clang built-in infra - yes, this pass will simplify.
What buggy part do you refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this mechanism is needed.
translateSpirvIntrinsic
doesn't require it.BTW, why do we need this in the first place? According to my understanding these SPIR-V built-ins are lowered to valid SPIR-V, which should be supported by the back-end.
The one fixed by this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You refer to SPIRV translation part. Agree - this could be done similar to NVPTX.
Not quite. The vector BE does not support SPIRV intrinsics, as it is not a SIMT BE. But with subgroup size = 1 restriction it is doable really. I thought about this as a future step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise, this functions are represented in SPIR-V as a variable with decoration and I think it must be supported by any back-end. AFAIK, there are a lot of ESIMD specific patches landed to SPIR-V translator and it seems to the right place to lower this functionality.
SPIR-V consumer by design is supposed to lower standard SPIR-V instructions to HW specific intructions/intrinsics.
If I understand it correctly ESIMD extension implementation includes these two parts:
According to my understanding these are not required if we move lowering of SPIR-V standard instructions to SPIR-V consumer.
NOTE: I'm talking about standard SPIR-V functionality only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. But ESIMD back-end by design wasn't a "SPIRV consumer" and glue passes are still required to feed it LLVM IR resulting from SPIRV translation. Also ESIMD BE can't consume arbitrary SPIRV with Kernel capability - there are a number of restrictions. E.g. ESIMD BE does not have a concept of workitem, a central one in SIMT world. My point is that there are still design options which need to be discussed with the ESIMD BE devs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SPIR-V consumption is not designed, what is the point in using SPIR-V then?
It looks like using LLVM IR as exchange format in ESIMD mode would be much easier and doesn't require so much hacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IR stability.
There can be parts which can be improved, but I don't see how this justifies calling it hacks.