forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 0
LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DoyleLi
wants to merge
764
commits into
sycl
Choose a base branch
from
private/DoyleLi/test_auto_merge
base: sycl
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is the next step towards supporting bitcode auto upgrade with opaque pointers. The ValueList now stores the Value* together with its associated type ID, which allows inspecting the original pointer element type of arbitrary values. This is a largely mechanical change threading the type ID through various places. I've left TODOTypeID placeholders in a number of places where determining the type ID is either non-trivial or requires allocating a new type ID not present in the original bitcode. For this reason, the new type IDs are also not used for anything yet (apart from propagation). They will get used once the TODOs are resolved. Differential Revision: https://reviews.llvm.org/D119821
There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space. TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120279
…on (NFC) The eq sub zero fold currently has an artificial one-use limitation, causing us to miss this fold.
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120285
D118623 added code to fold not-of-compare into a compare with the inverted predicate, if the compare had no other uses. This relies on accurate use lists in the IR but it was run before setPhiValues, when some phi inputs are still stored in a data structure on the side, instead of being real uses in the IR. The effect was that a phi that should be using the original compare result would now get an inverted result instead. Fix this by moving simplifyConditions after setPhiValues. Differential Revision: https://reviews.llvm.org/D120312
This adds very basic support for hashing MachineBasicBlock and MachineFunction, for use in MachineFunctionPass to detect passes that modify the MachineFunction wrongly. Differential Revision: https://reviews.llvm.org/D120122
This patch adds the codegen support for `atomic compare` in clang. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D118632
https://discourse.llvm.org/t/parallel-input-file-parsing/60164 To decouple symbol initialization and section initialization, `Defined::section` assignment should be postponed after input file parsing. To avoid spurious duplicate definition error due to two definitions in COMDAT groups of the same signature, we should postpone the duplicate symbol check. The function is called postScan instead of a more specific name like checkDuplicateSymbols, because we may merge Symbol::mergeProperties into postScan. It is placed after compileBitcodeFiles to apply to ET_REL files produced by LTO. This causes minor diagnostic regression for skipLinkedOutput configurations: ld.lld --thinlto-index-only a.bc b.o (bitcode definition prevails) won't detect duplicate symbol error. I think this is an acceptable compromise. The important cases where (a) both files are bitcode or (b) --thinlto-index-only is unused are still detected. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D119908
This patch update the PFTBuilder to be able to lower the construct present in semantics. This is a building block for other lowering patches that will be posted soon. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: PeteSteinfeld, schweitz Differential Revision: https://reviews.llvm.org/D120336 Co-authored-by: Jean Perier <[email protected]> Co-authored-by: V Donaldson <[email protected]>
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538
This trait results in PDL ops being erroneously CSE'd. These ops are side-effect free in the rewriter but not in the matcher (where unused values aren't allowed anyways). These ops should have a more nuanced side-effect modeling, this is fixing a bug introduced by a previous change. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D120222
With 1c1e2cc a new swift5 reflection section for multi-payload enum mask information was added, which is called mpenum. This change simply adds a check to make sure dsymutil can dump out information in that section into the dSYM bundle. Differential Revision: https://reviews.llvm.org/D120291
…dvectors" This reverts commit 9fc1a0d. We have bisected a compiler crash to this revision and will provide a test case soon.
This reverts commit 63eb963. Breaks MLIR build.
This reverts commit 9865c3f. Looks like our commits raced and Jeff fixed the build issue.
Darwin otool implements this flag as a one-stop solution for displaying bind and rebase info. As I am working on upstreaming chained fixup support this command will be useful to write testcases. Differential Revision: https://reviews.llvm.org/D113573
This is part of a series of patches to upstream support for Mach-O chained fixups. This patch adds support for parsing the chained fixup load command and parsing the chained fixups header. It also puts into place the abstract interface that will be used to iterate over the fixups. Differential Revision: https://reviews.llvm.org/D113630
This is the driver part of D91605 <https://reviews.llvm.org/D91605>, a workaround to allow direct calls to `__tls_get_addr` on Solaris/amd64. Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D119829
This is a restricted alternative to D91605 <https://reviews.llvm.org/D91605> which only works on Solaris 11.4 SRU 10+, but would break the build on Solaris 11.3 and Illumos which lack `dlpi_tls_modid`. Apart from that, the patch is trivial. One caveat is that the `sanitizer_common` and `asan` tests need to be linked explicitly with `ld -z relax=transtls` on Solaris/amd64 since the archives with calls to `__tls_get_addr` are linked in directly. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D120048
AIX's libc generates "Error -1 occurred" instead of the "Unknown Error" expected by these test cases. Add this as expected output for AIX only. Reviewed By: daltenty, #powerpc, #libc, zibi, Quuxplusone Differential Revision: https://reviews.llvm.org/D119982
Add support for va intrinsics for the XPLINK ABI. Only the extended vararg variant, which uses a pointer to next argument, is supported. The standard variant will build on this. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D120148
…handle both values of select This is a pre-commit of test cases relevant for D119643. CorrelatedValuePropagation should handle inverted select condition, but it does not yet.
… of select The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example: ``` loop: %phi = phi i32* [ %sel, %loop ], [ %x, %entry ] %f = tail call i32* @f(i32* %phi) %cmp1 = icmp ne i32* %f, %y %sel = select i1 %cmp1, i32* %f, i32* null %cmp2 = icmp eq i32* %sel, null br i1 %cmp2, label %return, label %loop ``` But the select condition can be inverted: ``` %cmp1 = icmp eq i32* %f, %y %sel = select i1 %cmp1, i32* null, i32* %f ``` The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D119643
Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D120112
Fix UBSan-reported issue in MCPlusBuilder::encodeAnnotationImm (left shift of a negative value). Test Plan: ``` ninja check-bolt ... PASS: BOLT-Unit :: Core/./CoreTests/AArch64/MCPlusBuilderTester.Annotation/0 (1 of 140) PASS: BOLT-Unit :: Core/./CoreTests/X86/MCPlusBuilderTester.Annotation/0 (131 of 134) ``` Reviewed By: maksfb, yota9 Differential Revision: https://reviews.llvm.org/D120260
…-sections for ELF -fdata-sections decides whether global variables go into different sections. This is orthogonal to whether we place their metadata (`.data` or `asan_globals`) into different sections. With -fno-data-sections, `-fsanitize-address-globals-dead-stripping` can still: * deduplicate COMDAT `asan.module_ctor` and `asan.module_dtor` * (with ld --gc-sections): for a data section (e.g. `.data`), if all global variables defined relative to it are unreferenced, discard them and associated `asan_globals` sections (rare but no need to exclude this case) Similar to c7b9094 for PE/COFF. Reviewed By: #sanitizers, kstoimenov, vitalybuka Differential Revision: https://reviews.llvm.org/D120394
Cleanup BasicBolckUtilsTest using C++ raw string literals, remove duplicated block functions and smaller style changes. Differential Revision: https://reviews.llvm.org/D120095
…gardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096
…stener This patch is a follow-up of D120100 to address some feedbacks from @labath. This should mainly fix the race issue with the even listener by moving the listener setup to the main thread. This also changes the SBDebugger::GetProgressFromEvent SWIG binding arguments to be output only, so the user don't have to provide them. Finally, this updates the test to check it the out arguments are returned in a tuple and re-enables the test on all platforms. Differential Revision: https://reviews.llvm.org/D120284 Signed-off-by: Med Ismail Bennani <[email protected]>
Now that sparse tensor types are first-class citizens and the sparse compiler is taking shape, it is time to make sure other compiler optimizations compose well with sparse tensors. Mostly, this should be completely transparent (i.e., dense and sparse take the same path). However, in some cases, optimizations only make sense in the context of sparse tensors. This is a first example of such an optimization, where fusing a sampled elt-wise multiplication only makes sense when the resulting kernel has a potential lower asymptotic complexity due to the sparsity. As an extreme example, running SDDMM with 1024x1024 matrices and a sparse sampling matrix with only two elements runs in 463.55ms in the unfused case but just 0.032ms in the fused case, with a speedup of 14485x that is only possible in the exciting world of sparse computations! Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D120429
…oops D118090 causes a pretty significant (19%) regression in some Eigen benchmarks. Investigating is a bit time consuming as the compilation unit where this occurs is large. Rather than revert, this patch adds a flag controlling that behavior (enabled by default).
Symbol.h depends on InputFiles.h. This change moves us toward dropping the weird dependency. The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but the compromise is acceptable.
Previously, we only support float64. We now support float32 and float64. When constructing a tensor without providing a data type, the default is float32. Fix the tests to data type consistency. All PyTACO application tests now use float32 to match the default data type of TACO. Other tests may use float32 or float64. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D120356
Introduce -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for HIP runtime. When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D120298
CONFLICT (content): Merge conflict in clang/test/Preprocessor/predefined-macros.c CONFLICT (content): Merge conflict in clang/include/clang/Driver/Options.td CONFLICT (content): Merge conflict in clang/include/clang/Basic/LangOptions.h
SPIRVLowerBool pass silently discards line information producing IR without !dbg nodes arrached. This commit addresses this issue and adds test Original commit: KhronosGroup/SPIRV-LLVM-Translator@ee391ea
… sw stack Original commit: KhronosGroup/SPIRV-LLVM-Translator@25fc23d
LLVM: llvm/llvm-project@9d899d8f01872 SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@e95ae260
Clarify the design doc to note that the C++ attribute `[[sycl_detail::uses_aspects()]]` is only needed for the device compiler and should be protected via `#ifdef`.
Local kernel arguments must be aligned to the type size, simply using `std::vector<char>` doesn't always provide the correct alignment. So this patch adds extra padding to the vector and ensures that the pointer returned for the accessor is actually aligned to the type size. This issue was exposed by: intel/llvm-test-suite#608, which was a follow up to fixing local accessor alignment for the CUDA plugin.
Add a template document to use when creating new SYCL extension specifications. We will change our existing specifications to follow this template over time. Also add a README describing the process we follow to create, modify, and maintain these specifications.
This old extension for specialization constants is marked deprecated in the headers, so make the specification deprecated too.
Add validation rules to the SPIR-V extension SPV_INTEL_global_variable_decorations.
These changes adds the definition of the SYCL_EXT_ONEAPI_PROPERTIES feature macro and moves the corresponding proposal to experimental support. Signed-off-by: Steffen Larsen <[email protected]>
The change was suggested by @premanandrao during code review of the implementation.
We decided that the OpenCL API Specification should not contain detailed information about how the APIs interact with SPIR-V. Instead, we want this information in the OpenCL SPIR-V Environment Specification. Change this extension specification accordingly.
* [SYCL][ESIMD] Add support for lsc mem access APIs Signed-off-by: Sergey Dmitriev <[email protected]> * Removed XeHP_SDV from the list of supported platforms * Removed DG2 from the list of supported platforms for 2d intrinsics * Removed cache hints from user-visible lsc SLM APIs * Replaced "flat-address" with "USM pointer" * Removed Transposed and Transformed params from lsc_store2d template * Removed L1 cache hint from atomic operations * Removed NElts from atomic operations * Reordered parameters for lsc atomic templates to make them consistent with regular atomics * Added static asserts to check data sizes for Transformed and Transposed messages * Added checks for allowed cache hints * Add special handling for u8 and u16 data types * Remove 'Transposed' and 'Transformed' perameters from prefetch 2d
Signed-off-by: Xiaodong Li <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@55639c2
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@f797de2