-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[AMDGPU] Fix test failures when expensive checks are enabled #130644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix test failures when expensive checks are enabled #130644
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) Changes[MLIR][Affine] Fix crash in loop unswitching/hoistAffineIfOp (#130401) Fix obvious crash as a result of missing affine.parallel handling. Also, Fixes: #62323 [clang][bytecode] Implement _builtin{memchr,strchr,char_memchr} (#130420) llvm has recently started to use [mlir]Add a check to ensure bailing out when reducing to a scalar (#129694) Fixes issue #64075 Minimal example crashing :
[X86] combineINSERT_SUBVECTOR - attempt to recursively shuffle combine if both base/sub-vectors are already shuffles (#130304) [lldb] Remove progress report coalescing (#130329) Remove support for coalescing progress reports in LLDB. This rdar://146425487 [gn build] Port a1b14db [clang][bytecode][NFC] Check conditional op condition for ConstantExprs (#130425) Same thing we now do in if statements. Check the condition of a [RISCV] Added test for dag spill fix [TableGen] Use Register in FastISel output. NFC [clang][bytecode] Surround bcp condition with Start/EndSpeculation (#130427) This is similar to what the current interpreter is doing - the [X86] combineConcatVectorOps - convert (V)SHUFPS concatenation to use combineConcatVectorOps recursion (#130426) Only concatenate X86ISD::SHUFP nodes if at least one operand is [SandboxVec] Add region-from-bbs helper pass (#130153) RegionFromBBs is a helper Sandbox IR function pass that builds a region [llvm-profdata] Fix typo in llvm-profdata (#114675) Signed-off-by: Peter Jung <[email protected]> [llvm][NFC]Fix a few typos (#110844) [gn build] Port 2fb1f03 [X86] Fix typo in X86ISD::SHUFP concatenation [Clang] Fix typo 'dereferencable' to 'dereferenceable' (#116761) This patch corrects the typo 'dereferencable' to 'dereferenceable' in [NFC][YAML][IR] Output CfiFunction sorted (#130379) As-is it's NFC, as internally We are changing internals of Sorting by name is unnecessary but good for [libc++][NFC] Fixed bad link in 21.rst (#130428) Re-land "[mlir][ODS] Add a generated builder that takes the Properties struct" (#130117) (#130373) This reverts commit 32f5437. Investigations showed that the unit test utilities were calling erase(), Revert "[gold] Fix compilation (#130334)" This reverts commit b0baa1d. Reverting follow-up commit to ce9e1d3 since the original commit test is flaky. Revert "[clangd] fix warning by adding missing parens" This reverts commit df79000. Reverting follow-up commit to ce9e1d3 since the original commit test is flaky. Revert "Modify the localCache API to require an explicit commit on CachedFile… (#115331)" This reverts commit ce9e1d3. The unittest added in this commit seems to be flaky causing random failure on buildbots:
[NFC][YAML] Switch Follow up to #130379. [gn build] Port 1d763f3 [clangd] Add This option, under The default value is Fixes clangd/clangd#2074 [AArch64] Use Register in AArch64FastISel.cpp. NFC [NFC][IR] De-duplicate CFI related code (#130450) [msan] Handle Arm NEON pairwise min/max instructions (#129824) Change the handling of:
Updates the tests from #129760 Adds a note that maybeHandleSimpleNomemIntrinsic may incorrectly match [NFC][YAML] Replace iterators with simple getter (#130449) To simplify #130382. [libc++][NFC] Comment cleanup for
[msan] Apply handleVectorReduceIntrinsic to max/min vector instructions (#129819) Changes the handling of:
Also adds a parameter to handleVectorReduceIntrinsic to specify whether Updates the tests from #129741, [libc] Added type-generic macros for fixed-point functions (#129371) Adds macros [clang][bytecode][NFC] Bail out on non constant evaluated builtins (#130431) If the ASTContext says so, don't bother trying to constant evaluate the [ARM] Use Register in FastISel. NFC [ARM] Remove unused argument. NFC [AArch64] Remove unused DenseMap variable. NFC [ARM] Change FastISel Address from a struct to a class. NFC This allows us to use Register in the interface, but store an [clang][analyzer][NFC] Fix typos in comments (#130456) [ExecutionEngine] Avoid repeated map lookups (NFC) (#130461) [IPO] Avoid repeated hash lookups (NFC) (#130462) [Scalar] Avoid repeated hash lookups (NFC) (#130463) [Utils] Avoid repeated hash lookups (NFC) (#130464) [llvm-jitlink] Avoid repeated hash lookups (NFC) (#130465) [llvm-profgen] Avoid repeated hash lookups (NFC) (#130466) [lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 (#123576) Instructions with relocation
[InstCombine] Add handling for (or (zext x), (shl (zext (ashr x, bw/2-1))), bw/2) -> (sext x) fold (#130316) Minor tweak to #129363 which handled all the cases where there was a sext for the original source value, but not for cases where the source is already half the size of the destination type Another regression noticed in #76524 [clang][bytecode] Fix getting pointer element type in __builtin_memcmp (#130485) When such a pointer is heap allocated, the type we get is a pointer [X86] combineConcatVectorOps - use all_of to check for matching PSHUFD/PSHUFLW/PSHUFHW shuffle mask. Prep work before adding 512-bit support. [clang-tidy] Fix invalid fixit from modernize-use-ranges for nullptr used with std::unique_ptr (#127162) This PR fixes issue #124815 by correcting the handling of Updated the logic to suppress warnings for [X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y (#130475) Though it only happens in v1i1 when we generate llvm.masked.load/store https://godbolt.org/z/vjsrofsqx [ValueTracking] Bail out on x86_fp80 when computing fpclass with knownbits (#130477) In #97762, we assume the Closes #130408. Revert "[lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 (#123576)" This reverts commit 6fbe491. [VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419) Create an empty VPlan first, then let the HCFG builder create a plain This simplifies the HCFG builder (which should probably be renamed) and As follow-up, I plan to also preserve the exit branches in the initial The conversion from plain CFG with potentially multiple exits to a This is needed to enable VPlan-based predication. Currently early exit Another follow-up is updating the outer loop handling to also introduce [1] PR: #128419 [gn build] Port fd26708 [NFC][Cloning] Make ClonedModule case more obvious in CollectDebugInfoForCloning (#129143) Summary: Test Plan: [libc++] Protect more code against -Wdeprecated. (#130419) This seems needed when updating the CI Docker image. [libc++][CI] Update action runner base image. (#130433) Updates to the latest release. The side effect of this change is [HLSL] Disallow virtual inheritance and functions (#127346) This PR disallows virtual inheritance and virtual functions in HLSL. [NFC][Cloning] Simplify the flow in FindDebugInfoToIdentityMap (#129144) Summary: Test Plan: [Sanitizers][Darwin] Correct iterating of MachO load commands (#130161) The condition to stop iterating so far was to look for load command cmd Correcting this by limiting the number of iterations to the count rdar://143903403 Co-authored-by: Mariusz Borsa <[email protected]> [AArch64] Improve vector funnel shift by constant costs. (#130044) We now have better codegen, and can have better costs to match. The llvm-project/llvm/test/CodeGen/AArch64/fsh.ll Line 3941 in 7e5821b
[LV] Add outer loop test with different successor orders in inner latch. [NFC][Cloning] Add a helper to collect debug info from instructions (#129145) Summary: Test Plan: Revert "[ARM] Change FastISel Address from a struct to a class. NFC" This reverts commit d47bc6f. I forgot to commit clang-format cleanup before I pushed this. Recommit "[ARM] Change FastISel Address from a struct to a class. NFC" With clang-format this time. Original message: [lldb] Add missing converstion to optional [X86] Use Register in FastISel. NFC Replace 'Reg == 0' with '!Reg' [HLSL] select scalar overloads for vector conditions (#129396) This PR adds scalar/vector overloads for vector conditions to the Fixes #126570 [ADT] Use This is to make sure that ADT helpers consistently use argument This was a part of #87936 but reverted due to buildbot failures. Now [gn build] Port e85e29c [alpha.webkit.UnretainedLambdaCapturesChecker] Add a WebKit checker for lambda capturing NS or CF types. (#128651) Add a new WebKit checker for checking that lambda captures of CF types [Clang] use constant evaluation context for constexpr if conditions (#123667) Fixes #123524 This PR addresses the issue of immediate function expressions not [Xtensa] Implement Xtensa MAC16 Option. (#130004) [RISCV] Fix incorrect mask of shuffle vector in the test. (NFC) (#130244) The mask of shuffle vector should be <u, u, 4, 6, 8, 10, 12, 14>, not And the mask of suffle vector with an undef initial element has been [clang] Fix typos in options text. (#130129) [clang] Reject constexpr-unknown values as constant expressions more consistently (#129952) Perform the check for constexpr-unknown values in the same place we While I'm here, also fix a rejects-valid with a reference that doesn't The existing behavior with -fexperimental-new-constant-interpreter seems Followup to #128409. Fixes #129844. Fixes #129845. [llvm-objdump][ELF]Fix crash when reading strings from .dynstr (#125679) This change introduces a check for the strtab offset to prevent Fixes: #86612 Co-authored-by: James Henderson <[email protected]> [APFloat] Fix Fixes #63895 Before this PR, Additionally, (Note to reviewer: I don't have commit access) [Clang][CodeGen] Fix demangler invariant comment assertion (#130522) This patch makes the assertion (that is currently in a comment) that Reland [lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 Instructions with relocation
This patch fixes the buildbots failuer of lld tests. InstCombine: Fix a crash in When constructing a PHI node in Fixes: SWDEV-516420 [AMDGPU] Add GFX12 S_ALLOC_VGPR instruction (#130018) This patch only adds the instruction for disassembly support. We neither have an instrinsic nor codegen support, and it is unclear For now, it will be generated only by the backend in very specific Co-authored-by: Jannik Silvanus <[email protected]> [RISCV] Remove Predicates from classes in RISCVInstrInfoXTHead.td. NFC All of instantiations of these classes also specify Predicates Also move the DecoderNamespace to the instantiations for consistency [AArch64][CostModel] Alter sdiv/srem cost where the divisor is constant (#123552) This patch revises the cost model for sdiv/srem and draws its inspiration from the udiv/urem patch #122236 The typical codegen for the different scenarios has been mentioned as notes/comments in the code itself( this is done owing to lot of scenarios such that it would be difficult to mention them here in the patch description). [AArch64] Avoid repeated hash lookups (NFC) (#130542) [CodeGen] Avoid repeated hash lookups (NFC) (#130543) [alpha.webkit.NoUnretainedMemberChecker] Add a new WebKit checker for unretained member variables and ivars. (#128641) Add a new WebKit checker for member variables and instance variables of [mlir] Apply ClangTidy finding (NFC) loop variable is copied but only used as const reference; consider making it a const reference [clang][NFC] Clean up Expr::EvaluateAsConstantExpr (#130498) The Info.EnableNewConstInterp case is already handled above. [Clang] Fix segmentation fault caused by This happens when using Similarly to #111701 [mlir][CAPI][python] bind CallSiteLoc, FileLineColRange, FusedLoc, NameLoc (#129351) This PR extends the python bindings for CallSiteLoc, FileLineColRange, I also did some "spring cleaning" here ( [libunwind][RISCV] Make asm statement volatile (#130286) Compiling with [ADT/Support] Add includes to fix module build Current Clang complains that 'size_t' / 'reference_wrapper' "must be [Clang][AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (2/3) (#125688) Add support for the new SHF_AARCH64_PURECODE ELF section flag: The general implementation follows the existing one for ARM targets. Related PRs:
Revert "[clang] Implement instantiation context note for checking template parameters (#126088)" This reverts commit a24523a. This is causing significant compile-time regressions for C++ code, see: [X86] checkBitcastSrcVectorSize - early return when reach to MaxRecursionDepth. (#130226) [readobj][Arm][AArch64] Refactor Build Attributes parsing under ELFAtributeParser and add support for AArch64 Build Attributes (#128727) Refactor readobj to integrate AArch64 Build Attributes under
Add support for parsing AArch64 Build Attributes. [MCA] Adding missing instructions in AArch64 Neoverse V1 tests (#128892) Added missing instructions for LLVM Opcodes coverage. It will help to Follow up of MR ##126703 No more asm instruction comments to maintain. [gn build] Port b1ebfac [X86] Add test case showing its not always beneficial to fold concat(palignr(),palignr()) -> palignr(concat(),concat()) [lldb] Add more ARM checks in TestLldbGdbServer.py (#130277) When #130034 enabled RISC-V ARM only has 4 argument registers, which matches Arm's ABI for it: The ABI defines a link register LR, and I assume that's what becomes Tested on ARM and AArch64 Linux. [lldb] Clean up UnwindAssemblyInstEmulation (#129030) My main motivation was trying to understand how the function and whether If we delay the construction of the unwind plan to the end of the I've also noticed that a large portion of the function is devoted to [MLIR][py] Add PyThreadPool as wrapper around MlirLlvmThreadPool in MLIR python bindings (#130109) In some projects like JAX ir.Context are used with disabled multi-threading to avoid However, when context has enabled multithreading it also uses locks on [X86] Add test case showing its not always beneficial to fold concat(pack(),pack()) -> pack(concat(),concat()) [mlir] Refactor ConvertVectorToLLVMPass options (#128219) The This PR expands the Additionally, I have changed some interfaces to only take these specific Finally, I have added a simple lit test that just prints the pass Fixes #129046 [IR] Fix assertion error in User new/delete edge case (#129914) Fixes #129900 If [MergeFunc] Check full IR and comdat keys in comdat.ll. Spelling in lit.cfg.py [X86] combineConcatVectorOps - convert X86ISD::PALIGNR concatenation to use combineConcatVectorOps recursion (#130572) Only concatenate X86ISD::PALIGNR nodes if at least one operand is beneficial to concatenate [flang] Move parser invocations into ParserActions (#130309) FrontendActions.cpp is currently one of the biggest compilation units in User time (seconds): 139.21 This commit separates out explicit invocations of the parser into a User time (seconds): 70.08 While the ones for the newly created ParserActions.cpp as follows: User time (seconds): 104.33 Signed-off-by: Kajetan Puchalski <[email protected]> [TailDuplicator] Do not restrict the computed gotos (#114990) Fixes #106846. This is what I learned from GCC. I found that GCC does not duplicate the > Duplicate the blocks containing computed gotos. This basically Revert "[lldb][asan] Add temporary logging to ReportRetriever" This reverts commit 39a4da2. We skipped the failing tests in [lldb] Remove an extraneous This was missed in review but is showing up in lldb-dap output. [Clang][AArch64] Fix typo with colon-separated syntax for system registers (#105608) The range for Op0 was set to 1 instead of 3. The description of e493f17 visually llvm-project/llvm/lib/Target/AArch64/AArch64SystemOperands.td Lines 658 to 674 in 796787d
Gobolt: https://godbolt.org/z/WK9PqPvGE Co-authored-by: v01dxyz <[email protected]> [AMDGPU][NewPM] Port AMDGPUReserveWWMRegs to NPM (#123722) [X86] Add test case showing its not always beneficial to fold concat(pshufb(),pshufb()) -> pshufb(concat(),concat()) [X86] Improve test coverage for concat(pmaddubsw(),pmaddubsw()) -> pmaddubsw(concat(),concat()) Ensure we have tests for both beneficial/non-beneficial concatenation cases [AArch64][ELF Parser] Fix out-of-scope variable usage (#130576) Return a reference to a persistent variable instead of a temporary copy. [LLVM][SVE] Add isel for scalable vector bfloat copysign operations. (#130098) [clang] NNS: don't print trailing scope resolution operator in diagnostics (#130529) This clears up the printing of a NestedNameSpecifier so a trailing '::' This fixes a bunch of diagnostics where the trailing :: was awkward. There is a drive-by improvement to error recovery, where now we print This will clear up further uses of NNS printing in further patches. AMDGPU: Move enqueued block handling into clang (#128519) The previous implementation wasn't maintaining a faithful IR This now avoids using a function attribute on kernels and avoids using I couldn't figure out how to get rename-with-external-symbol behavior We could move towards initializing the runtime handle in the https://reviews.llvm.org/D141700 [RISCV][test] Add test case showing case where machine copy propagation leaves behind a no-op reg move Pre-commit for #129889. [AArch64][ELF Parser] Fix out-of-scope variable usage (#130594) Return a reference to a persistent variable instead of a temporary copy. [DAG] fold AVGFLOORS to AVGFLOORU for non-negative operand (#84746) (#129678) Fold ISD::AVGFLOORS to ISD::AVGFLOORU for non-negative operand. Cover test is modified for uhadd with zero extension. Fixes #84746 Revert "[clang] Fix missing diagnostic of declaration use when accessing TypeDecls through typename access (#129681)" This caused incorrect -Wunguarded-availability warnings. See comment on > We were missing a call to DiagnoseUseOfDecl when performing typename This reverts commit 4c4fd6b. [X86] combineConcatVectorOps - convert X86ISD::HADD/SUB concatenation to use combineConcatVectorOps recursion (#130579) Only concatenate X86ISD::HADD/SUB nodes if at least one operand is beneficial to concatenate [X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488) https://godbolt.org/z/rWYdqnjjx [clang] fix matching of nested template template parameters (#130447) When checking the template template parameters of template template This also has a few drive-by fixes, such as checking the template Fixes #130362 [Clang] Force expressions with UO_Not to not be non-negative (#126846) This PR addresses the bug of not throwing warnings for the following int test13(unsigned a, int *b) {
return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
} However, in the original issue, a comment mentioned that negation, Fixes #18878 [flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an When entering a target region, Some Fortran objects use descriptors in their in-memory representation. Co-authored-by: Sergio Afonso <[email protected]> [flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122) Accommodate it in OmpDirectiveSpecification, which may become the [MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (#129889) This change removes 189 static instances of no-op reg-reg moves (i.e. [gn build] Port 0d2c55c [X86] combineConcatVectorOps - convert PSHUFB/PSADBW/VPMADDUBSW/VPMADDUBSW concatenation to use combineConcatVectorOps recursion (#130592) Only concatenate nodes if at least one operand is beneficial to concatenate [clang][test] Don't require specific alignment in test case (#130589) #129952 / https://lab.llvm.org/buildbot/#/builders/154/builds/13059
The other test does not check alignment, so I'm assuming that it is not [SLP]Reduce number of alternate instruction, where possible Previous version was reviewed here #123360 Patch tries to remove wide alternate operations.
i.e. half of the results are just unused. This leads to increased Patch introduces SplitVectorize mode, where it splits the operations by
It allows to improve the performance by reducing number of ops. Also, it -O3+LTO, AVX512
Olden/tsp - small variations -O3+LTO, mcpu=sifive-p470 Metric: size..text
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/ieee/GCC-C-execute-ieee-pr50310.test 886.00 608.00 -31.4% CINT2006/464.h264ref - extra v16i32 reduction CINT2006/464.h264ref - extra vector code in find_sad_16x16 Reviewers: hiraditya Reviewed By: hiraditya Pull Request: #128907 Revert "[libc++] Don't try to wait on a thread that hasn't started in std::async (#125433)" This reverts commit 11766a4. [ARM] Fix HW thread pointer functionality (#130027)
reference:https://reviews.llvm.org/D114116 [X86] combineConcatVectorOps - add missing VT/Subtarget checks for MOV*DUP concatenation folds. [clang][SPIR-V] Use the SPIR-V backend by default (#129545) The SPIR-V backend is now a supported backend, and we believe it is Some IR generated by Clang today, such as those requiring SPIR-V target Enable it by default, but keep some of the code as it is still called by Signed-off-by: Sarnie, Nick <[email protected]> [OpenMP] Mark Failing OpenMP Tests as XFAIL on Windows (#129040) This patch marks specific OpenMP runtime tests as XFAIL on Windows due [LLD][COFF] Add /noexp for link.exe compatibility (#128814) See #107346 [mlir] Fix bazel build after f3dcc0f [mlir][TOSA] Fix linalg lowering of depthwise conv2d (#130293) Current lowering for tosa.depthwise_conv2d assumes if both zero points [OpenACC] Implement 'bind' ast/sema for 'routine' directive The 'bind' clause allows the renaming of a function during code Note there are additional rules to this in the implicit-function routine [clang][bytecode] Fix builtin_memcmp buffer sizes for pointers (#130570) Don't use the pointer size, but the number of elements multiplied by the [libc++][docs] Remove mis-added entry for P2513R4 (#130581) P2513R4 neither touched library wording nor required library [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
FPSCR is present on all targets with a VFP, but the FPEXC register is DPRCS1 will sum up all previous bytes that were saved, and will emit Avoid annotating the saving of FPSCR and FPEXC for functions marked Co-authored-by: Jake Vossen <[email protected]> Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit 1f05703. [HLSL][Driver] Use temporary files correctly (#130436) This updates the DXV and Metal Converter actions to properly use AMDGPU: Rename variable from undef to poison (#130460) StructurizeCFG: Use poison instead of undef (#130459) There are a surprising number of codegen changes from this. Revert "Reland "[clang] Lower modf builtin using This broke modff calls on 32-bit x86 Windows. See comment on the PR. > This updates the existing modf[f|l] builtin to be lowered via the This reverts commit cd1d9a8. [libc] Add Relates to Forbid co_await and co_yield in invalid expr contexts (#130455) Fix #78426 C++26 introduced braced initializer lists as template arguments. Co-authored-by: cor3ntin <[email protected]> [X86] Add test cases showing its not always beneficial to fold concat(add/mul(),add/mul()) -> add/mul(concat(),concat()) [lldb] fix set SBLineEntryColumn (#130435) Calling the public API This probably should be backported as it has been since version 3.4. Co-authored-by: Jonas Devlieghere <[email protected]> [ADT] Use This is to make sure that ADT helpers consistently use argument This was a part of #87936 but Also fix potential issue with double-move on the input range. [Libc] Turn implicit to explicit conversion (#130615) This fixes a build issue on the AMDGPU libc bot after Co-authored-by: Joseph Huber <[email protected]> [X86] combineConcatVectorOps - convert ADD/SUB/MUL concatenation to use combineConcatVectorOps recursion Only concatenate ADD/SUB/MUL nodes if at least one operand is beneficial to concatenate [mlir] Fix bazel build after f3dcc0f TD files [mlir][SparseTensor][NFC] Migrate to OpAsmAttrInterface for ASM alias generation (#130483) After the introduction of [libc] Fix implicit conversion warnings. (#130635) [flang][OpenMP] Parse cancel-directive-name as clause (#130146) The cancellable construct names on CANCEL or CANCELLATION POINT Instead of parsing them into a custom structure, parse them as a clause, [lldb-dap] Migrating terminated statistics to the event body. (#130454) Per the DAP spec, the event 'body' field should contain any additional
This allows us to more uniformly handle event messages. [AMDGPU] Fix test failures when expensive checks are enabled This PR fixes test failures introduced in #127353 when expensive checkes are Patch is 122.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130644.diff 3 Files Affected:
diff --git a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
index 4ca00f2daf97a..4b5a7c207055a 100644
--- a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
+++ b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
@@ -12,7 +12,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX10_1-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX10_1: ; %bb.0:
; GFX10_1-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT: buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT: s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT: s_mov_b32 exec_lo, s4
; GFX10_1-NEXT: v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_1-NEXT: v_writelane_b32 v1, s55, 0
; GFX10_1-NEXT: s_and_b32 s4, 0, exec_lo
; GFX10_1-NEXT: v_add_nc_u32_e32 v0, 64, v0
; GFX10_1-NEXT: ;;#ASMSTART
@@ -20,16 +26,28 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX10_1-NEXT: ;;#ASMEND
; GFX10_1-NEXT: v_lshrrev_b32_e64 v0, 5, s32
; GFX10_1-NEXT: v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_1-NEXT: v_readfirstlane_b32 s59, v0
+; GFX10_1-NEXT: v_readfirstlane_b32 s55, v0
; GFX10_1-NEXT: ;;#ASMSTART
-; GFX10_1-NEXT: ; use s59, scc
+; GFX10_1-NEXT: ; use s55, scc
; GFX10_1-NEXT: ;;#ASMEND
+; GFX10_1-NEXT: v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT: buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT: s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT: s_waitcnt vmcnt(0)
; GFX10_1-NEXT: s_setpc_b64 s[30:31]
;
; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX10_3: ; %bb.0:
; GFX10_3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT: buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT: s_mov_b32 exec_lo, s4
; GFX10_3-NEXT: v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_3-NEXT: v_writelane_b32 v1, s55, 0
; GFX10_3-NEXT: s_and_b32 s4, 0, exec_lo
; GFX10_3-NEXT: v_add_nc_u32_e32 v0, 64, v0
; GFX10_3-NEXT: ;;#ASMSTART
@@ -37,17 +55,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX10_3-NEXT: ;;#ASMEND
; GFX10_3-NEXT: v_lshrrev_b32_e64 v0, 5, s32
; GFX10_3-NEXT: v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_3-NEXT: v_readfirstlane_b32 s59, v0
+; GFX10_3-NEXT: v_readfirstlane_b32 s55, v0
; GFX10_3-NEXT: ;;#ASMSTART
-; GFX10_3-NEXT: ; use s59, scc
+; GFX10_3-NEXT: ; use s55, scc
; GFX10_3-NEXT: ;;#ASMEND
+; GFX10_3-NEXT: v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT: buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT: s_waitcnt vmcnt(0)
; GFX10_3-NEXT: s_setpc_b64 s[30:31]
;
; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT: s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT: scratch_store_b32 off, v1, s1 ; 4-byte Folded Spill
+; GFX11-NEXT: s_mov_b32 exec_lo, s0
; GFX11-NEXT: s_add_i32 s0, s32, 64
-; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT: v_writelane_b32 v1, s55, 0
; GFX11-NEXT: v_mov_b32_e32 v0, s0
; GFX11-NEXT: s_and_b32 s0, 0, exec_lo
; GFX11-NEXT: s_addc_u32 s0, s32, 0x4040
@@ -57,10 +85,16 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX11-NEXT: s_bitcmp1_b32 s0, 0
; GFX11-NEXT: s_bitset0_b32 s0, 0
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-NEXT: s_mov_b32 s59, s0
+; GFX11-NEXT: s_mov_b32 s55, s0
; GFX11-NEXT: ;;#ASMSTART
-; GFX11-NEXT: ; use s59, scc
+; GFX11-NEXT: ; use s55, scc
; GFX11-NEXT: ;;#ASMEND
+; GFX11-NEXT: v_readlane_b32 s55, v1, 0
+; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT: s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT: scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT: s_mov_b32 exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]
;
; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
@@ -70,7 +104,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
+; GFX12-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT: scratch_store_b32 off, v1, s32 offset:16388 ; 4-byte Folded Spill
+; GFX12-NEXT: s_wait_alu 0xfffe
+; GFX12-NEXT: s_mov_b32 exec_lo, s0
+; GFX12-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX12-NEXT: s_and_b32 s0, 0, exec_lo
+; GFX12-NEXT: v_writelane_b32 v1, s55, 0
; GFX12-NEXT: s_add_co_ci_u32 s0, s32, 0x4000
; GFX12-NEXT: v_mov_b32_e32 v0, s32
; GFX12-NEXT: s_wait_alu 0xfffe
@@ -80,34 +120,54 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX12-NEXT: ; use alloca0 v0
; GFX12-NEXT: ;;#ASMEND
; GFX12-NEXT: s_wait_alu 0xfffe
-; GFX12-NEXT: s_mov_b32 s59, s0
+; GFX12-NEXT: s_mov_b32 s55, s0
; GFX12-NEXT: ;;#ASMSTART
-; GFX12-NEXT: ; use s59, scc
+; GFX12-NEXT: ; use s55, scc
; GFX12-NEXT: ;;#ASMEND
+; GFX12-NEXT: v_readlane_b32 s55, v1, 0
+; GFX12-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT: scratch_load_b32 v1, off, s32 offset:16388 ; 4-byte Folded Reload
; GFX12-NEXT: s_wait_alu 0xfffe
+; GFX12-NEXT: s_mov_b32 exec_lo, s0
+; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT: buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_lshrrev_b32_e64 v0, 6, s32
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 64, v0
+; GFX8-NEXT: v_writelane_b32 v1, s55, 0
; GFX8-NEXT: ;;#ASMSTART
; GFX8-NEXT: ; use alloca0 v0
; GFX8-NEXT: ;;#ASMEND
; GFX8-NEXT: v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT: s_movk_i32 s59, 0x4040
-; GFX8-NEXT: v_add_u32_e32 v0, vcc, s59, v0
+; GFX8-NEXT: s_movk_i32 s55, 0x4040
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, s55, v0
+; GFX8-NEXT: v_readfirstlane_b32 s55, v0
; GFX8-NEXT: s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT: v_readfirstlane_b32 s59, v0
; GFX8-NEXT: ;;#ASMSTART
-; GFX8-NEXT: ; use s59, scc
+; GFX8-NEXT: ; use s55, scc
; GFX8-NEXT: ;;#ASMEND
+; GFX8-NEXT: v_readlane_b32 s55, v1, 0
+; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT: buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT: s_mov_b64 exec, s[4:5]
+; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX900: ; %bb.0:
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT: s_mov_b64 exec, s[4:5]
; GFX900-NEXT: v_lshrrev_b32_e64 v0, 6, s32
; GFX900-NEXT: v_add_u32_e32 v0, 64, v0
; GFX900-NEXT: ;;#ASMSTART
@@ -115,34 +175,52 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
; GFX900-NEXT: ;;#ASMEND
; GFX900-NEXT: v_lshrrev_b32_e64 v0, 6, s32
; GFX900-NEXT: v_add_u32_e32 v0, 0x4040, v0
+; GFX900-NEXT: v_writelane_b32 v1, s55, 0
+; GFX900-NEXT: v_readfirstlane_b32 s55, v0
; GFX900-NEXT: s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT: v_readfirstlane_b32 s59, v0
; GFX900-NEXT: ;;#ASMSTART
-; GFX900-NEXT: ; use s59, scc
+; GFX900-NEXT: ; use s55, scc
; GFX900-NEXT: ;;#ASMEND
+; GFX900-NEXT: v_readlane_b32 s55, v1, 0
+; GFX900-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT: buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT: s_mov_b64 exec, s[4:5]
+; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]
;
; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
; GFX942: ; %bb.0:
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT: s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT: s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT: scratch_store_dword off, v1, s2 ; 4-byte Folded Spill
+; GFX942-NEXT: s_mov_b64 exec, s[0:1]
; GFX942-NEXT: s_add_i32 s0, s32, 64
; GFX942-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NEXT: s_and_b64 s[0:1], 0, exec
; GFX942-NEXT: s_addc_u32 s0, s32, 0x4040
; GFX942-NEXT: s_bitcmp1_b32 s0, 0
; GFX942-NEXT: s_bitset0_b32 s0, 0
+; GFX942-NEXT: v_writelane_b32 v1, s55, 0
+; GFX942-NEXT: s_mov_b32 s55, s0
; GFX942-NEXT: ;;#ASMSTART
; GFX942-NEXT: ; use alloca0 v0
; GFX942-NEXT: ;;#ASMEND
-; GFX942-NEXT: s_mov_b32 s59, s0
; GFX942-NEXT: ;;#ASMSTART
-; GFX942-NEXT: ; use s59, scc
+; GFX942-NEXT: ; use s55, scc
; GFX942-NEXT: ;;#ASMEND
+; GFX942-NEXT: v_readlane_b32 s55, v1, 0
+; GFX942-NEXT: s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT: s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT: scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT: s_mov_b64 exec, s[0:1]
+; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: s_setpc_b64 s[30:31]
%alloca0 = alloca [4096 x i32], align 64, addrspace(5)
%alloca1 = alloca i32, align 4, addrspace(5)
call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
- call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1, i32 0)
+ call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1, i32 0)
ret void
}
@@ -152,36 +230,65 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
; GFX10_1-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX10_1: ; %bb.0:
; GFX10_1-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT: buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT: s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT: v_writelane_b32 v1, s55, 0
; GFX10_1-NEXT: v_lshrrev_b32_e64 v0, 5, s32
-; GFX10_1-NEXT: s_lshr_b32 s59, s32, 5
-; GFX10_1-NEXT: s_addk_i32 s59, 0x4040
+; GFX10_1-NEXT: s_lshr_b32 s55, s32, 5
+; GFX10_1-NEXT: s_addk_i32 s55, 0x4040
; GFX10_1-NEXT: v_add_nc_u32_e32 v0, 64, v0
; GFX10_1-NEXT: ;;#ASMSTART
; GFX10_1-NEXT: ; use alloca0 v0
; GFX10_1-NEXT: ;;#ASMEND
; GFX10_1-NEXT: ;;#ASMSTART
-; GFX10_1-NEXT: ; use s59
+; GFX10_1-NEXT: ; use s55
; GFX10_1-NEXT: ;;#ASMEND
+; GFX10_1-NEXT: v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT: buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT: s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT: s_waitcnt vmcnt(0)
; GFX10_1-NEXT: s_setpc_b64 s[30:31]
;
; GFX10_3-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX10_3: ; %bb.0:
; GFX10_3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT: buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT: v_writelane_b32 v1, s55, 0
; GFX10_3-NEXT: v_lshrrev_b32_e64 v0, 5, s32
-; GFX10_3-NEXT: s_lshr_b32 s59, s32, 5
-; GFX10_3-NEXT: s_addk_i32 s59, 0x4040
+; GFX10_3-NEXT: s_lshr_b32 s55, s32, 5
+; GFX10_3-NEXT: s_addk_i32 s55, 0x4040
; GFX10_3-NEXT: v_add_nc_u32_e32 v0, 64, v0
; GFX10_3-NEXT: ;;#ASMSTART
; GFX10_3-NEXT: ; use alloca0 v0
; GFX10_3-NEXT: ;;#ASMEND
; GFX10_3-NEXT: ;;#ASMSTART
-; GFX10_3-NEXT: ; use s59
+; GFX10_3-NEXT: ; use s55
; GFX10_3-NEXT: ;;#ASMEND
+; GFX10_3-NEXT: v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT: s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT: buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT: s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT: s_waitcnt vmcnt(0)
; GFX10_3-NEXT: s_setpc_b64 s[30:31]
;
; GFX11-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT: s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT: scratch_store_b32 off, v1, s1 ; 4-byte Folded Spill
+; GFX11-NEXT: s_mov_b32 exec_lo, s0
+; GFX11-NEXT: v_writelane_b32 v1, s55, 0
; GFX11-NEXT: s_add_i32 s0, s32, 64
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX11-NEXT: v_mov_b32_e32 v0, s0
@@ -189,10 +296,16 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use alloca0 v0
; GFX11-NEXT: ;;#ASMEND
-; GFX11-NEXT: s_mov_b32 s59, s0
+; GFX11-NEXT: s_mov_b32 s55, s0
; GFX11-NEXT: ;;#ASMSTART
-; GFX11-NEXT: ; use s59
+; GFX11-NEXT: ; use s55
; GFX11-NEXT: ;;#ASMEND
+; GFX11-NEXT: v_readlane_b32 s55, v1, 0
+; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT: s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT: scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT: s_mov_b32 exec_lo, s0
+; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]
;
; GFX12-LABEL: scalar_mov_materializes_frame_index_dead_scc:
@@ -202,67 +315,110 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
+; GFX12-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT: scratch_store_b32 off, v1, s32 offset:16388 ; 4-byte Folded Spill
+; GFX12-NEXT: s_wait_alu 0xfffe
+; GFX12-NEXT: s_mov_b32 exec_lo, s0
+; GFX12-NEXT: v_writelane_b32 v1, s55, 0
; GFX12-NEXT: s_add_co_i32 s0, s32, 0x4000
; GFX12-NEXT: v_mov_b32_e32 v0, s32
+; GFX12-NEXT: s_wait_alu 0xfffe
+; GFX12-NEXT: s_mov_b32 s55, s0
; GFX12-NEXT: ;;#ASMSTART
; GFX12-NEXT: ; use alloca0 v0
; GFX12-NEXT: ;;#ASMEND
-; GFX12-NEXT: s_wait_alu 0xfffe
-; GFX12-NEXT: s_mov_b32 s59, s0
; GFX12-NEXT: ;;#ASMSTART
-; GFX12-NEXT: ; use s59
+; GFX12-NEXT: ; use s55
; GFX12-NEXT: ;;#ASMEND
+; GFX12-NEXT: v_readlane_b32 s55, v1, 0
+; GFX12-NEXT: s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT: scratch_load_b32 v1, off, s32 offset:16388 ; 4-byte Folded Reload
; GFX12-NEXT: s_wait_alu 0xfffe
+; GFX12-NEXT: s_mov_b32 exec_lo, s0
+; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT: buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT: s_mov_b64 exec, s[4:5]
+; GFX8-NEXT: v_writelane_b32 v1, s55, 0
+; GFX8-NEXT: s_lshr_b32 s55, s32, 6
; GFX8-NEXT: v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT: s_lshr_b32 s59, s32, 6
+; GFX8-NEXT: s_addk_i32 s55, 0x4040
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 64, v0
; GFX8-NEXT: ;;#ASMSTART
; GFX8-NEXT: ; use alloca0 v0
; GFX8-NEXT: ;;#ASMEND
-; GFX8-NEXT: s_addk_i32 s59, 0x4040
; GFX8-NEXT: ;;#ASMSTART
-; GFX8-NEXT: ; use s59
+; GFX8-NEXT: ; use s55
; GFX8-NEXT: ;;#ASMEND
+; GFX8-NEXT: v_readlane_b32 s55, v1, 0
+; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT: buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT: s_mov_b64 exec, s[4:5]
+; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX900-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX900: ; %bb.0:
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT: s_mov_b64 exec, s[4:5]
+; GFX900-NEXT: v_writelane_b32 v1, s55, 0
+; GFX900-NEXT: s_lshr_b32 s55, s32, 6
; GFX900-NEXT: v_lshrrev_b32_e64 v0, 6, s32
-; GFX900-NEXT: s_lshr_b32 s59, s32, 6
+; GFX900-NEXT: s_addk_i32 s55, 0x4040
; GFX900-NEXT: v_add_u32_e32 v0, 64, v0
; GFX900-NEXT: ;;#ASMSTART
; GFX900-NEXT: ; use alloca0 v0
; GFX900-NEXT: ;;#ASMEND
-; GFX900-NEXT: s_addk_i32 s59, 0x4040
; GFX900-NEXT: ;;#ASMSTART
-; GFX900-NEXT: ; use s59
+; GFX900-NEXT: ; use s55
; GFX900-NEXT: ;;#ASMEND
+; GFX900-NEXT: v_readlane_b32 s55, v1, 0
+; GFX900-NEXT: s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT: s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT: buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT: s_mov_b64 exec, s[4:5]
+; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]
;
; GFX942-LABEL: scalar_mov_materializes_frame_index_dead_scc:
; GFX942: ; %bb.0:
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT: s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT: s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT: scratch_store_dword off, v1, s2 ; 4-byte Folded Spill
+; GFX942-NEXT: s_mov_b64 exec, s[0:1]
; GFX942-NEXT: s_add_i32 s0, s32, 64
; GFX942-NEXT: v_mov_b32_e32 v0, s0
; GFX942-NEXT: s_add_i32 s0, s32, 0x4040
+; GFX942-NEXT: v_writelane_b32 v1, s55, 0
+; GFX942-NEXT: s_mov_b32 s55, s0
; GFX942-NEXT: ;;#ASMSTART
; GFX942-NEXT: ; use alloca0 v0
; GFX942-NEXT: ;;#ASMEND
-; GFX942-NEXT: s_mov_b32 s59, s0
; GFX942-NEXT: ;;#ASMSTART
-; GFX942-NEXT: ; use s59
+; GFX942-NEXT: ; use s55
; GFX942-NEXT: ;;#ASMEND
+; GFX942-NEXT: v_readlane_b32 s55, v1, 0
+; GFX942-NEXT: s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT: s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT: scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT: s_mov_b64 exec, s[0:1]
+; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: s_setpc_b64 s[30:31]
%alloca0 = alloca [4096 x i32], align 64, addrspace(5)
%alloca1 = alloca i32, align 4, addrspace(5)
call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
- call void asm sideeffect "; use $0", "{s59}"(ptr addrspace(5) %alloca1)
+ call void asm sideeffect "; use $0", "{s55}"(ptr addrspace(5) %alloca1)
ret void
}
@@ -272,8 +428,14 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
; GFX10_1-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10_1-NEXT: s_mov_b32 s5, s33
; GFX10_1-NEXT: s_mov_b32 s33, s32
-; GFX10_1-NEXT: s_add_i32 s32, s32, 0x81000
+; GFX10_1-NEXT: s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT: s_add_i32 s6, s33, 0x80880
+; GFX10_1-NEXT: buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_1-NEXT: s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT: s_mov_b32 exec_lo, s4
; GFX10_1-NEXT: v_lshrrev_b32_e64 v0, 5, s33
+; GFX10_1-NEXT: v_writelane_b32 v1, s55, 0
+; GFX10_1-NEXT: s_add_i32 s32, s32, 0x81000
; GFX10_1-NEXT: s_and_b32 s4, 0, exec_lo
; GFX10_1-NEXT: s_mov_b32 s32, s33
; GFX10_1-NEXT: v_add_nc_u32_e32 v0, 64, v0
@@ -281,12 +443,19 @@ define void @scalar_mov_materializes_frame_index_u...
[truncated]
|
This PR fixes test failures introduced in #127353 when expensive checkes are enabled.
dd09b8d
to
af30b17
Compare
; GFX942-NEXT: s_setpc_b64 s[30:31] | ||
%alloca0 = alloca [4096 x i32], align 64, addrspace(5) | ||
%alloca1 = alloca i32, align 4, addrspace(5) | ||
call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0) | ||
call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1, i32 0) | ||
call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1, i32 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s59
is no longer lives in because it is caller saved. Switch to s55
here, but I'm not sure why there are massive spillings generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention this in the commit message, I wasn't sure how this only touched tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
They only fail with expensive checks enabled.
#130644)" As suggested on 5ec884e#commitcomment-153707488 this seems to fix the following tests when building with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON: LLVM :: CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll LLVM :: CodeGen/AMDGPU/materialize-frame-index-sgpr.ll LLVM :: CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll > This PR fixes test failures introduced in #127353 when expensive checks > are enabled. > > For `llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll` and > `llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll`, `s59` > is no longer in live-ins because it is caller saved. Switch to `s55` in > this PR.
llvm#130644)" As suggested on llvm@5ec884e#commitcomment-153707488 this seems to fix the following tests when building with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON: LLVM :: CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll LLVM :: CodeGen/AMDGPU/materialize-frame-index-sgpr.ll LLVM :: CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll > This PR fixes test failures introduced in llvm#127353 when expensive checks > are enabled. > > For `llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll` and > `llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll`, `s59` > is no longer in live-ins because it is caller saved. Switch to `s55` in > this PR.
This PR fixes test failures introduced in #127353 when expensive checks are enabled.
For
llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
andllvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
,s59
is no longer in live-ins because it is caller saved. Switch tos55
in this PR.