Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW100) #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 764 commits into
base: sycl
Choose a base branch
from

Conversation

DoyleLi
Copy link
Owner

@DoyleLi DoyleLi commented Mar 2, 2022

nikic and others added 30 commits February 22, 2022 17:27
This is the next step towards supporting bitcode auto upgrade with
opaque pointers. The ValueList now stores the Value* together with
its associated type ID, which allows inspecting the original pointer
element type of arbitrary values.

This is a largely mechanical change threading the type ID through
various places. I've left TODOTypeID placeholders in a number of
places where determining the type ID is either non-trivial or
requires allocating a new type ID not present in the original
bitcode. For this reason, the new type IDs are also not used for
anything yet (apart from propagation). They will get used once the
TODOs are resolved.

Differential Revision: https://reviews.llvm.org/D119821
There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
differ in the IR but they end up the same class instructions after
selection. For example a divergent load from constant address space
ends up being the same global_load as a load from global address space.

TODO: merge global stores.
TODO: handle SADDR forms.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120279
…on (NFC)

The eq sub zero fold currently has an artificial one-use limitation,
causing us to miss this fold.
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.

TODO: merge global stores.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120285
D118623 added code to fold not-of-compare into a compare
with the inverted predicate, if the compare had no other
uses. This relies on accurate use lists in the IR but it
was run before setPhiValues, when some phi inputs are still
stored in a data structure on the side, instead of being
real uses in the IR. The effect was that a phi that should
be using the original compare result would now get an
inverted result instead.

Fix this by moving simplifyConditions after setPhiValues.

Differential Revision: https://reviews.llvm.org/D120312
This adds very basic support for hashing MachineBasicBlock
and MachineFunction, for use in MachineFunctionPass to
detect passes that modify the MachineFunction wrongly.

Differential Revision: https://reviews.llvm.org/D120122
This patch adds the codegen support for `atomic compare` in clang.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D118632
https://discourse.llvm.org/t/parallel-input-file-parsing/60164

To decouple symbol initialization and section initialization, `Defined::section`
assignment should be postponed after input file parsing. To avoid spurious
duplicate definition error due to two definitions in COMDAT groups of the same
signature, we should postpone the duplicate symbol check.

The function is called postScan instead of a more specific name like
checkDuplicateSymbols, because we may merge Symbol::mergeProperties into
postScan. It is placed after compileBitcodeFiles to apply to ET_REL files
produced by LTO. This causes minor diagnostic regression
for skipLinkedOutput configurations: ld.lld --thinlto-index-only a.bc b.o
(bitcode definition prevails) won't detect duplicate symbol error. I think this
is an acceptable compromise. The important cases where (a) both files are
bitcode or (b) --thinlto-index-only is unused are still detected.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D119908
This patch update the PFTBuilder to be able to lower
the construct present in semantics.

This is a building block for other lowering patches that will be posted soon.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: PeteSteinfeld, schweitz

Differential Revision: https://reviews.llvm.org/D120336

Co-authored-by: Jean Perier <[email protected]>
Co-authored-by: V Donaldson <[email protected]>
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
This trait results in PDL ops being erroneously CSE'd. These ops are side-effect free in the rewriter but not in the matcher (where unused values aren't allowed anyways). These ops should have a more nuanced side-effect modeling, this is fixing a bug introduced by a previous change.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D120222
With 1c1e2cc a new swift5 reflection section for multi-payload enum mask information was added, which is called mpenum. This change simply adds a check to make sure dsymutil can dump out information in that section into the dSYM bundle.

Differential Revision: https://reviews.llvm.org/D120291
…dvectors"

This reverts commit 9fc1a0d.

We have bisected a compiler crash to this revision and will provide a
test case soon.
This reverts commit 9865c3f.

Looks like our commits raced and Jeff fixed the build issue.
Darwin otool implements this flag as a one-stop solution for
displaying bind and rebase info. As I am working on upstreaming
chained fixup support this command will be useful to write testcases.

Differential Revision: https://reviews.llvm.org/D113573
This is part of a series of patches to upstream support for Mach-O chained fixups.

This patch adds support for parsing the chained fixup load command and
parsing the chained fixups header. It also puts into place the
abstract interface that will be used to iterate over the fixups.

Differential Revision: https://reviews.llvm.org/D113630
This is the driver part of D91605 <https://reviews.llvm.org/D91605>, a
workaround to allow direct calls to `__tls_get_addr` on Solaris/amd64.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D119829
This is a restricted alternative to D91605
<https://reviews.llvm.org/D91605> which only works on Solaris 11.4 SRU 10+,
but would break the build on Solaris 11.3 and Illumos which lack
`dlpi_tls_modid`.

Apart from that, the patch is trivial.  One caveat is that the
`sanitizer_common` and `asan` tests need to be linked explicitly with `ld
-z relax=transtls` on Solaris/amd64 since the archives with calls to
`__tls_get_addr` are linked in directly.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D120048
AIX's libc generates "Error -1 occurred" instead of the "Unknown Error"
expected by these test cases. Add this as expected output for AIX only.

Reviewed By: daltenty, #powerpc, #libc, zibi, Quuxplusone

Differential Revision: https://reviews.llvm.org/D119982
Add support for va intrinsics for the XPLINK ABI.
Only the extended vararg variant, which uses a pointer to next
argument, is supported. The standard variant will build on this.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D120148
…handle both values of select

This is a pre-commit of test cases relevant for D119643.
CorrelatedValuePropagation should handle inverted select condition, but it does not yet.
… of select

The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example:
```
loop:
	%phi = phi i32* [ %sel, %loop ], [ %x, %entry ]
	%f = tail call i32* @f(i32* %phi)
	%cmp1 = icmp ne i32* %f, %y
	%sel = select i1 %cmp1, i32* %f, i32* null
	%cmp2 = icmp eq i32* %sel, null
	br i1 %cmp2, label %return, label %loop
```
But the select condition can be inverted:
```
	%cmp1 = icmp eq i32* %f, %y
	%sel = select i1 %cmp1, i32* null, i32* %f
```
The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D119643
nico and others added 29 commits February 23, 2022 18:41
Fix UBSan-reported issue in MCPlusBuilder::encodeAnnotationImm (left shift of a
negative value).

Test Plan:
```
ninja check-bolt
...
PASS: BOLT-Unit :: Core/./CoreTests/AArch64/MCPlusBuilderTester.Annotation/0 (1 of 140)
PASS: BOLT-Unit :: Core/./CoreTests/X86/MCPlusBuilderTester.Annotation/0 (131 of 134)
```

Reviewed By: maksfb, yota9

Differential Revision: https://reviews.llvm.org/D120260
…-sections for ELF

-fdata-sections decides whether global variables go into different sections.
This is orthogonal to whether we place their metadata (`.data` or `asan_globals`) into different sections.

With -fno-data-sections, `-fsanitize-address-globals-dead-stripping` can still:

* deduplicate COMDAT `asan.module_ctor` and `asan.module_dtor`
* (with ld --gc-sections): for a data section (e.g. `.data`), if all global variables defined relative to it are unreferenced, discard them and associated `asan_globals` sections (rare but no need to exclude this case)

Similar to c7b9094 for PE/COFF.

Reviewed By: #sanitizers, kstoimenov, vitalybuka

Differential Revision: https://reviews.llvm.org/D120394
Cleanup BasicBolckUtilsTest using C++ raw string literals, remove
duplicated block functions and smaller style changes.

Differential Revision: https://reviews.llvm.org/D120095
…gardless of PHIs

The `SplitIndirectBrCriticalEdges` function was originally designed for
`CodeGenPrepare` and skipped splitting of edges when the destination
block didn't contain any `PHI` instructions. This only makes sense when
reducing COPYs like `CodeGenPrepare`. In the case of
`PGOInstrumentation` or `GCOVProfiling` it would result in missed
counters and wrong result in functions with computed goto.

Differential Revision: https://reviews.llvm.org/D120096
…stener

This patch is a follow-up of D120100 to address some feedbacks from
@labath.

This should mainly fix the race issue with the even listener by moving
the listener setup to the main thread.

This also changes the SBDebugger::GetProgressFromEvent SWIG binding
arguments to be output only, so the user don't have to provide them.

Finally, this updates the test to check it the out arguments are returned
in a tuple and re-enables the test on all platforms.

Differential Revision: https://reviews.llvm.org/D120284

Signed-off-by: Med Ismail Bennani <[email protected]>
Now that sparse tensor types are first-class citizens and the sparse compiler
is taking shape, it is time to make sure other compiler optimizations compose
well with sparse tensors. Mostly, this should be completely transparent (i.e.,
dense and sparse take the same path). However, in some cases, optimizations
only make sense in the context of sparse tensors. This is a first example of
such an optimization, where fusing a sampled elt-wise multiplication only makes
sense when the resulting kernel has a potential lower asymptotic complexity due
to the sparsity.

As an extreme example, running SDDMM with 1024x1024 matrices and a sparse
sampling matrix with only two elements runs in 463.55ms in the unfused
case but just 0.032ms in the fused case, with a speedup of 14485x that
is only possible in the exciting world of sparse computations!

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D120429
…oops

D118090 causes a pretty significant (19%) regression in some Eigen
benchmarks. Investigating is a bit time consuming as the compilation
unit where this occurs is large. Rather than revert, this patch adds a
flag controlling that behavior (enabled by default).
Symbol.h depends on InputFiles.h. This change moves us toward dropping the
weird dependency.

The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but
the compromise is acceptable.
Previously, we only support float64. We now support float32 and float64. When
constructing a tensor without providing a data type, the default is float32.

Fix the tests to data type consistency. All PyTACO application tests now use
float32 to match the default data type of TACO. Other tests may use float32 or
float64.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D120356
Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.

When -fgpu-default-stream=per-thread, HIP kernels are
launched through hipLaunchKernel_spt instead of
hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1
is defined by the preprocessor to enable other per-thread stream
API's.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D120298
  CONFLICT (content): Merge conflict in clang/test/Preprocessor/predefined-macros.c
  CONFLICT (content): Merge conflict in clang/include/clang/Driver/Options.td
  CONFLICT (content): Merge conflict in clang/include/clang/Basic/LangOptions.h
SPIRVLowerBool pass silently discards line information
producing IR without !dbg nodes arrached.
This commit addresses this issue and adds test

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@ee391ea
Clarify the design doc to note that the C++ attribute
`[[sycl_detail::uses_aspects()]]` is only needed for the device
compiler and should be protected via `#ifdef`.
Local kernel arguments must be aligned to the type size, simply using `std::vector<char>` doesn't always provide the correct alignment. So this patch adds extra padding to the vector and ensures that the pointer returned for the accessor is actually aligned to the type size.

This issue was exposed by: intel/llvm-test-suite#608, which was a follow up to fixing local accessor alignment for the CUDA plugin.
Add a template document to use when creating new SYCL extension
specifications.  We will change our existing specifications to follow
this template over time.  Also add a README describing the process we
follow to create, modify, and maintain these specifications.
This old extension for specialization constants is marked deprecated
in the headers, so make the specification deprecated too.
Add validation rules to the SPIR-V extension
SPV_INTEL_global_variable_decorations.
These changes adds the definition of the SYCL_EXT_ONEAPI_PROPERTIES
feature macro and moves the corresponding proposal to experimental
support.

Signed-off-by: Steffen Larsen <[email protected]>
The change was suggested by @premanandrao during code review of the
implementation.
We decided that the OpenCL API Specification should not contain
detailed information about how the APIs interact with SPIR-V. Instead,
we want this information in the OpenCL SPIR-V Environment
Specification.

Change this extension specification accordingly.
* [SYCL][ESIMD] Add support for lsc mem access APIs

Signed-off-by: Sergey Dmitriev <[email protected]>

* Removed XeHP_SDV from the list of supported platforms
* Removed DG2 from the list of supported platforms for 2d intrinsics
* Removed cache hints from user-visible lsc SLM APIs
* Replaced "flat-address" with "USM pointer"
* Removed Transposed and Transformed params from lsc_store2d template
* Removed L1 cache hint from atomic operations
* Removed NElts from atomic operations
* Reordered parameters for lsc atomic templates to make them consistent with regular atomics
* Added static asserts to check data sizes for Transformed and Transposed messages
* Added checks for allowed cache hints
* Add special handling for u8 and u16 data types
* Remove 'Transposed' and 'Transformed' perameters from prefetch 2d
Signed-off-by: Xiaodong Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.