LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

DoyleLi · 2022-03-02T02:24:59Z

LLVM: llvm/llvm-project@55639c2
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@f797de2

This is the next step towards supporting bitcode auto upgrade with opaque pointers. The ValueList now stores the Value* together with its associated type ID, which allows inspecting the original pointer element type of arbitrary values. This is a largely mechanical change threading the type ID through various places. I've left TODOTypeID placeholders in a number of places where determining the type ID is either non-trivial or requires allocating a new type ID not present in the original bitcode. For this reason, the new type IDs are also not used for anything yet (apart from propagation). They will get used once the TODOs are resolved. Differential Revision: https://reviews.llvm.org/D119821

There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space. TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120279

…on (NFC) The eq sub zero fold currently has an artificial one-use limitation, causing us to miss this fold.

This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120285

D118623 added code to fold not-of-compare into a compare with the inverted predicate, if the compare had no other uses. This relies on accurate use lists in the IR but it was run before setPhiValues, when some phi inputs are still stored in a data structure on the side, instead of being real uses in the IR. The effect was that a phi that should be using the original compare result would now get an inverted result instead. Fix this by moving simplifyConditions after setPhiValues. Differential Revision: https://reviews.llvm.org/D120312

This adds very basic support for hashing MachineBasicBlock and MachineFunction, for use in MachineFunctionPass to detect passes that modify the MachineFunction wrongly. Differential Revision: https://reviews.llvm.org/D120122

This patch adds the codegen support for `atomic compare` in clang. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D118632

https://discourse.llvm.org/t/parallel-input-file-parsing/60164 To decouple symbol initialization and section initialization, `Defined::section` assignment should be postponed after input file parsing. To avoid spurious duplicate definition error due to two definitions in COMDAT groups of the same signature, we should postpone the duplicate symbol check. The function is called postScan instead of a more specific name like checkDuplicateSymbols, because we may merge Symbol::mergeProperties into postScan. It is placed after compileBitcodeFiles to apply to ET_REL files produced by LTO. This causes minor diagnostic regression for skipLinkedOutput configurations: ld.lld --thinlto-index-only a.bc b.o (bitcode definition prevails) won't detect duplicate symbol error. I think this is an acceptable compromise. The important cases where (a) both files are bitcode or (b) --thinlto-index-only is unused are still detected. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D119908

This patch update the PFTBuilder to be able to lower the construct present in semantics. This is a building block for other lowering patches that will be posted soon. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: PeteSteinfeld, schweitz Differential Revision: https://reviews.llvm.org/D120336 Co-authored-by: Jean Perier <[email protected]> Co-authored-by: V Donaldson <[email protected]>

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538

This trait results in PDL ops being erroneously CSE'd. These ops are side-effect free in the rewriter but not in the matcher (where unused values aren't allowed anyways). These ops should have a more nuanced side-effect modeling, this is fixing a bug introduced by a previous change. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D120222

With 1c1e2cc a new swift5 reflection section for multi-payload enum mask information was added, which is called mpenum. This change simply adds a check to make sure dsymutil can dump out information in that section into the dSYM bundle. Differential Revision: https://reviews.llvm.org/D120291

…dvectors" This reverts commit 9fc1a0d. We have bisected a compiler crash to this revision and will provide a test case soon.

This reverts commit 63eb963. Breaks MLIR build.

This reverts commit 9865c3f. Looks like our commits raced and Jeff fixed the build issue.

Darwin otool implements this flag as a one-stop solution for displaying bind and rebase info. As I am working on upstreaming chained fixup support this command will be useful to write testcases. Differential Revision: https://reviews.llvm.org/D113573

This is part of a series of patches to upstream support for Mach-O chained fixups. This patch adds support for parsing the chained fixup load command and parsing the chained fixups header. It also puts into place the abstract interface that will be used to iterate over the fixups. Differential Revision: https://reviews.llvm.org/D113630

This is the driver part of D91605 <https://reviews.llvm.org/D91605>, a workaround to allow direct calls to `__tls_get_addr` on Solaris/amd64. Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D119829

This is a restricted alternative to D91605 <https://reviews.llvm.org/D91605> which only works on Solaris 11.4 SRU 10+, but would break the build on Solaris 11.3 and Illumos which lack `dlpi_tls_modid`. Apart from that, the patch is trivial. One caveat is that the `sanitizer_common` and `asan` tests need to be linked explicitly with `ld -z relax=transtls` on Solaris/amd64 since the archives with calls to `__tls_get_addr` are linked in directly. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D120048

AIX's libc generates "Error -1 occurred" instead of the "Unknown Error" expected by these test cases. Add this as expected output for AIX only. Reviewed By: daltenty, #powerpc, #libc, zibi, Quuxplusone Differential Revision: https://reviews.llvm.org/D119982

Add support for va intrinsics for the XPLINK ABI. Only the extended vararg variant, which uses a pointer to next argument, is supported. The standard variant will build on this. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D120148

…handle both values of select This is a pre-commit of test cases relevant for D119643. CorrelatedValuePropagation should handle inverted select condition, but it does not yet.

@f

… of select The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example: ``` loop: %phi = phi i32* [ %sel, %loop ], [ %x, %entry ] %f = tail call i32* @f(i32* %phi) %cmp1 = icmp ne i32* %f, %y %sel = select i1 %cmp1, i32* %f, i32* null %cmp2 = icmp eq i32* %sel, null br i1 %cmp2, label %return, label %loop ``` But the select condition can be inverted: ``` %cmp1 = icmp eq i32* %f, %y %sel = select i1 %cmp1, i32* null, i32* %f ``` The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D119643

…EV after 32b73bc" This reverts commit 5086cff. 32b73bc relanded in 1592d88.

Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D120112

Fix UBSan-reported issue in MCPlusBuilder::encodeAnnotationImm (left shift of a negative value). Test Plan: ``` ninja check-bolt ... PASS: BOLT-Unit :: Core/./CoreTests/AArch64/MCPlusBuilderTester.Annotation/0 (1 of 140) PASS: BOLT-Unit :: Core/./CoreTests/X86/MCPlusBuilderTester.Annotation/0 (131 of 134) ``` Reviewed By: maksfb, yota9 Differential Revision: https://reviews.llvm.org/D120260

…-sections for ELF -fdata-sections decides whether global variables go into different sections. This is orthogonal to whether we place their metadata (`.data` or `asan_globals`) into different sections. With -fno-data-sections, `-fsanitize-address-globals-dead-stripping` can still: * deduplicate COMDAT `asan.module_ctor` and `asan.module_dtor` * (with ld --gc-sections): for a data section (e.g. `.data`), if all global variables defined relative to it are unreferenced, discard them and associated `asan_globals` sections (rare but no need to exclude this case) Similar to c7b9094 for PE/COFF. Reviewed By: #sanitizers, kstoimenov, vitalybuka Differential Revision: https://reviews.llvm.org/D120394

Cleanup BasicBolckUtilsTest using C++ raw string literals, remove duplicated block functions and smaller style changes. Differential Revision: https://reviews.llvm.org/D120095

…gardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096

@labath

…stener This patch is a follow-up of D120100 to address some feedbacks from @labath. This should mainly fix the race issue with the even listener by moving the listener setup to the main thread. This also changes the SBDebugger::GetProgressFromEvent SWIG binding arguments to be output only, so the user don't have to provide them. Finally, this updates the test to check it the out arguments are returned in a tuple and re-enables the test on all platforms. Differential Revision: https://reviews.llvm.org/D120284 Signed-off-by: Med Ismail Bennani <[email protected]>

Now that sparse tensor types are first-class citizens and the sparse compiler is taking shape, it is time to make sure other compiler optimizations compose well with sparse tensors. Mostly, this should be completely transparent (i.e., dense and sparse take the same path). However, in some cases, optimizations only make sense in the context of sparse tensors. This is a first example of such an optimization, where fusing a sampled elt-wise multiplication only makes sense when the resulting kernel has a potential lower asymptotic complexity due to the sparsity. As an extreme example, running SDDMM with 1024x1024 matrices and a sparse sampling matrix with only two elements runs in 463.55ms in the unfused case but just 0.032ms in the fused case, with a speedup of 14485x that is only possible in the exciting world of sparse computations! Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D120429

…oops D118090 causes a pretty significant (19%) regression in some Eigen benchmarks. Investigating is a bit time consuming as the compilation unit where this occurs is large. Rather than revert, this patch adds a flag controlling that behavior (enabled by default).

Symbol.h depends on InputFiles.h. This change moves us toward dropping the weird dependency. The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but the compromise is acceptable.

Previously, we only support float64. We now support float32 and float64. When constructing a tensor without providing a data type, the default is float32. Fix the tests to data type consistency. All PyTACO application tests now use float32 to match the default data type of TACO. Other tests may use float32 or float64. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D120356

… NFC

Introduce -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for HIP runtime. When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D120298

CONFLICT (content): Merge conflict in clang/test/Preprocessor/predefined-macros.c CONFLICT (content): Merge conflict in clang/include/clang/Driver/Options.td CONFLICT (content): Merge conflict in clang/include/clang/Basic/LangOptions.h

…pulldown

SPIRVLowerBool pass silently discards line information producing IR without !dbg nodes arrached. This commit addresses this issue and adds test Original commit: KhronosGroup/SPIRV-LLVM-Translator@ee391ea

… sw stack Original commit: KhronosGroup/SPIRV-LLVM-Translator@25fc23d

LLVM: llvm/llvm-project@9d899d8f01872 SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@e95ae260

Clarify the design doc to note that the C++ attribute `[[sycl_detail::uses_aspects()]]` is only needed for the device compiler and should be protected via `#ifdef`.

Local kernel arguments must be aligned to the type size, simply using `std::vector<char>` doesn't always provide the correct alignment. So this patch adds extra padding to the vector and ensures that the pointer returned for the accessor is actually aligned to the type size. This issue was exposed by: intel/llvm-test-suite#608, which was a follow up to fixing local accessor alignment for the CUDA plugin.

Add a template document to use when creating new SYCL extension specifications. We will change our existing specifications to follow this template over time. Also add a README describing the process we follow to create, modify, and maintain these specifications.

This old extension for specialization constants is marked deprecated in the headers, so make the specification deprecated too.

Add validation rules to the SPIR-V extension SPV_INTEL_global_variable_decorations.

These changes adds the definition of the SYCL_EXT_ONEAPI_PROPERTIES feature macro and moves the corresponding proposal to experimental support. Signed-off-by: Steffen Larsen <[email protected]>

@premanandrao

The change was suggested by @premanandrao during code review of the implementation.

We decided that the OpenCL API Specification should not contain detailed information about how the APIs interact with SPIR-V. Instead, we want this information in the OpenCL SPIR-V Environment Specification. Change this extension specification accordingly.

* [SYCL][ESIMD] Add support for lsc mem access APIs Signed-off-by: Sergey Dmitriev <[email protected]> * Removed XeHP_SDV from the list of supported platforms * Removed DG2 from the list of supported platforms for 2d intrinsics * Removed cache hints from user-visible lsc SLM APIs * Replaced "flat-address" with "USM pointer" * Removed Transposed and Transformed params from lsc_store2d template * Removed L1 cache hint from atomic operations * Removed NElts from atomic operations * Reordered parameters for lsc atomic templates to make them consistent with regular atomics * Added static asserts to check data sizes for Transformed and Transposed messages * Added checks for allowed cache hints * Add special handling for u8 and u16 data types * Remove 'Transposed' and 'Transformed' perameters from prefetch 2d

Signed-off-by: Xiaodong Li <[email protected]>

nikic and others added 30 commits February 22, 2022 17:27

[InstCombine] Add test for missed select fold due to one use limitati…

f4e9df2

…on (NFC) The eq sub zero fold currently has an artificial one-use limitation, causing us to miss this fold.

[MIPS] Add -no-pie option to the clang driver's tests depend on it

cedc23b

[StableHashing] Hash machine basic blocks and functions

b47e2dc

This adds very basic support for hashing MachineBasicBlock and MachineFunction, for use in MachineFunctionPass to detect passes that modify the MachineFunction wrongly. Differential Revision: https://reviews.llvm.org/D120122

Fix the Sphinx build after f8cedc6

16994a2

[Clang][OpenMP] Add the codegen support for atomic compare

104d9a6

This patch adds the codegen support for `atomic compare` in clang. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D118632

[WebAssembly] Allow .data shorthand for .section .data,"",@

d657c68

[SLP] Use isInSchedulingRegion consistently [NFC]

8612b11

[mlir][pdl] NFC re-add NoSideEffect to Result and Results Op

63eb963

Revert "[AArch64] Alter mull shuffle(ext(..)) combine to work on buil…

ecb2700

…dvectors" This reverts commit 9fc1a0d. We have bisected a compiler crash to this revision and will provide a test case soon.

Revert "[mlir][pdl] NFC re-add NoSideEffect to Result and Results Op"

9865c3f

This reverts commit 63eb963. Breaks MLIR build.

[mlir][pdl] NFC fix missing include

ef7b982

Reland "[mlir][pdl] NFC re-add NoSideEffect to Result and Results Op"

de2cc2a

This reverts commit 9865c3f. Looks like our commits raced and Jeff fixed the build issue.

[Transforms] Pre-commit test cases for CorrelatedValuePropagation to …

0b302be

…handle both values of select This is a pre-commit of test cases relevant for D119643. CorrelatedValuePropagation should handle inverted select condition, but it does not yet.

Remove dead code.

a23f7c0

nico and others added 29 commits February 23, 2022 18:41

Reland "unbreak Modules/cxx20-export-import.cpp with LLVM_APPEND_VC_R…

34285bc

…EV after 32b73bc" This reverts commit 5086cff. 32b73bc relanded in 1592d88.

Teach the AArch64 backend to instruction select the BCAX instruction.

d7105e7

Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D120112

Simplify/cleanup BasicBlockUtilsTest

d7a3073

Cleanup BasicBolckUtilsTest using C++ raw string literals, remove duplicated block functions and smaller style changes. Differential Revision: https://reviews.llvm.org/D120095

[ELF] Remove SharedSymbol::getFile. NFC

47d18be

Symbol.h depends on InputFiles.h. This change moves us toward dropping the weird dependency. The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but the compromise is acceptable.

[ELF] Don't rely on Symbols.h's transitive inclusion of InputFiles.h.…

b01430a

… NFC

Merge from 'sycl' to 'sycl-web'

eae2c8f

Merge from '"main"' to '"sycl-web"' (2 commits)

a92bdea

CONFLICT (content): Merge conflict in clang/test/Preprocessor/predefined-macros.c CONFLICT (content): Merge conflict in clang/include/clang/Driver/Options.td CONFLICT (content): Merge conflict in clang/include/clang/Basic/LangOptions.h

Merge remote-tracking branch 'otcshare_llvm/sycl-web' into llvmspirv_…

076101e

…pulldown

fixed discarding of debug info metadata by SPIRVLowerBool pass

84204f1

SPIRVLowerBool pass silently discards line information producing IR without !dbg nodes arrached. This commit addresses this issue and adds test Original commit: KhronosGroup/SPIRV-LLVM-Translator@ee391ea

disable spirv-val since it seems that it does work well with existing…

381bb84

… sw stack Original commit: KhronosGroup/SPIRV-LLVM-Translator@25fc23d

LLVM and SPIRV-LLVM-Translator pulldown (WW10)

2f5e990

LLVM: llvm/llvm-project@9d899d8f01872 SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@e95ae260

[SYCL][DOC] Clarify "[[uses_aspects()]]" in design (intel#5594)

27cc930

Clarify the design doc to note that the C++ attribute `[[sycl_detail::uses_aspects()]]` is only needed for the device compiler and should be protected via `#ifdef`.

[SYCL][DOC] Deprecate old spec constant extension (intel#5676)

b185269

This old extension for specialization constants is marked deprecated in the headers, so make the specification deprecated too.

[SYCL][DOC] Add validation rules to SPIR-V ext (intel#5687)

dfaa070

Add validation rules to the SPIR-V extension SPV_INTEL_global_variable_decorations.

[SYCL][DOC] Enable SYCL_EXT_ONEAPI_PROPERTIES extension (intel#5693)

1984e74

These changes adds the definition of the SYCL_EXT_ONEAPI_PROPERTIES feature macro and moves the corresponding proposal to experimental support. Signed-off-by: Steffen Larsen <[email protected]>

[SYCL][DOC] Update spelling in device global design doc (intel#5654)

0e44f1b

The change was suggested by @premanandrao during code review of the implementation.

Debug auto merge function

70cd710

Signed-off-by: Xiaodong Li <[email protected]>

DoyleLi force-pushed the sycl branch from 4bd50e7 to f810047 Compare March 2, 2022 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

DoyleLi commented Mar 2, 2022

LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

Are you sure you want to change the base?

LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

Conversation

DoyleLi commented Mar 2, 2022