-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[AMDGPU] Remove the AnnotateKernelFeatures pass #130198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Jun Wang (jwanggit86) ChangesPreviously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass. Patch is 2.25 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130198.diff 116 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 57297288eecb4..c30c1cd3c8fb0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -95,11 +95,8 @@ void initializeAMDGPUDAGToDAGISelLegacyPass(PassRegistry &);
void initializeAMDGPUAlwaysInlinePass(PassRegistry&);
-Pass *createAMDGPUAnnotateKernelFeaturesPass();
Pass *createAMDGPUAttributorLegacyPass();
void initializeAMDGPUAttributorLegacyPass(PassRegistry &);
-void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);
-extern char &AMDGPUAnnotateKernelFeaturesID;
// DPP/Iterative option enables the atomic optimizer with given strategy
// whereas None disables the atomic optimizer.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
index a9bd41382c255..9c9fa5c6e2f0f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
@@ -52,11 +52,6 @@ class AMDGPUAnnotateKernelFeatures : public CallGraphSCCPass {
char AMDGPUAnnotateKernelFeatures::ID = 0;
-char &llvm::AMDGPUAnnotateKernelFeaturesID = AMDGPUAnnotateKernelFeatures::ID;
-
-INITIALIZE_PASS(AMDGPUAnnotateKernelFeatures, DEBUG_TYPE,
- "Add AMDGPU function attributes", false, false)
-
bool AMDGPUAnnotateKernelFeatures::addFeatureAttributes(Function &F) {
bool HaveStackObjects = false;
bool Changed = false;
@@ -131,7 +126,3 @@ bool AMDGPUAnnotateKernelFeatures::doInitialization(CallGraph &CG) {
TM = &TPC->getTM<TargetMachine>();
return false;
}
-
-Pass *llvm::createAMDGPUAnnotateKernelFeaturesPass() {
- return new AMDGPUAnnotateKernelFeatures();
-}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index ce3dcd920bce3..bb139e2c185c4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -510,7 +510,6 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPUAlwaysInlinePass(*PR);
initializeAMDGPUSwLowerLDSLegacyPass(*PR);
initializeAMDGPUAttributorLegacyPass(*PR);
- initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
initializeAMDGPUAnnotateUniformValuesLegacyPass(*PR);
initializeAMDGPUArgumentUsageInfoPass(*PR);
initializeAMDGPUAtomicOptimizerPass(*PR);
@@ -1294,12 +1293,6 @@ void AMDGPUPassConfig::addIRPasses() {
}
void AMDGPUPassConfig::addCodeGenPrepare() {
- if (TM->getTargetTriple().getArch() == Triple::amdgcn) {
- // FIXME: This pass adds 2 hacky attributes that can be replaced with an
- // analysis, and should be removed.
- addPass(createAMDGPUAnnotateKernelFeaturesPass());
- }
-
if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
EnableLowerKernelArguments)
addPass(createAMDGPULowerKernelArgumentsPass());
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 55af5826e90d0..c812837c29a46 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -601,12 +601,6 @@ GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
const CallingConv::ID CC = F.getCallingConv();
const bool IsKernel =
CC == CallingConv::AMDGPU_KERNEL || CC == CallingConv::SPIR_KERNEL;
- // FIXME: Should have analysis or something rather than attribute to detect
- // calls.
- const bool HasCalls = F.hasFnAttribute("amdgpu-calls");
- // FIXME: This attribute is a hack, we just need an analysis on the function
- // to look for allocas.
- const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
if (IsKernel && (!F.arg_empty() || ST.getImplicitArgNumBytes(F) != 0))
KernargSegmentPtr = true;
@@ -629,12 +623,14 @@ GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
DispatchID = true;
}
- // TODO: This could be refined a lot. The attribute is a poor way of
- // detecting calls or stack objects that may require it before argument
- // lowering.
+ const bool IsNoFlatScratchInitSet = F.hasFnAttribute("amdgpu-no-flat-scratch-init");
+
if (ST.hasFlatAddressSpace() && AMDGPU::isEntryFunctionCC(CC) &&
(IsAmdHsaOrMesa || ST.enableFlatScratch()) &&
- (HasCalls || HasStackObjects || ST.enableFlatScratch()) &&
+ // The line below: If enableFlatScratch() is true, whether
+ // no-flat-scratch-init is set is not important. If enableFlatScratch()
+ // is false, FlatScratchInit cannot be true for graphics CC.
+ (ST.enableFlatScratch() || (!IsNoFlatScratchInitSet && !AMDGPU::isGraphics(CC))) &&
!ST.flatScratchIsArchitected()) {
FlatScratchInit = true;
}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
index b96fc71be057e..6c2272c389a61 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
@@ -20,11 +20,14 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
+; CI-NEXT: s_add_i32 s12, s12, s17
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s2
; CI-NEXT: ds_dec_rtn_u32 v2, v1, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s0
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: v_mov_b32_e32 v1, s1
; CI-NEXT: flat_store_dword v[0:1], v2
; CI-NEXT: s_endpgm
@@ -35,11 +38,14 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
+; VI-NEXT: s_add_i32 s12, s12, s17
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s2
; VI-NEXT: ds_dec_rtn_u32 v2, v1, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s0
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: v_mov_b32_e32 v1, s1
; VI-NEXT: flat_store_dword v[0:1], v2
; VI-NEXT: s_endpgm
@@ -97,11 +103,14 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
+; CI-NEXT: s_add_i32 s12, s12, s17
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s2
; CI-NEXT: ds_dec_rtn_u32 v2, v1, v0 offset:16
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s0
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: v_mov_b32_e32 v1, s1
; CI-NEXT: flat_store_dword v[0:1], v2
; CI-NEXT: s_endpgm
@@ -112,11 +121,14 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
+; VI-NEXT: s_add_i32 s12, s12, s17
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s2
; VI-NEXT: ds_dec_rtn_u32 v2, v1, v0 offset:16
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s0
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: v_mov_b32_e32 v1, s1
; VI-NEXT: flat_store_dword v[0:1], v2
; VI-NEXT: s_endpgm
@@ -287,6 +299,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
; CI-LABEL: global_atomic_dec_ret_i32:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s2
@@ -302,6 +317,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
; VI-LABEL: global_atomic_dec_ret_i32:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s2
@@ -359,6 +377,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
; CI-LABEL: global_atomic_dec_ret_i32_offset:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -376,6 +397,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
; VI-LABEL: global_atomic_dec_ret_i32_offset:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s2, s2, 16
@@ -436,6 +460,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
; CI-LABEL: global_atomic_dec_ret_i32_offset_system:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -453,6 +480,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
; VI-LABEL: global_atomic_dec_ret_i32_offset_system:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s2, s2, 16
@@ -513,6 +543,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32(ptr addrspace(1) %ptr) #1
; CI-LABEL: global_atomic_dec_noret_i32:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s0
@@ -525,6 +558,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32(ptr addrspace(1) %ptr) #1
; VI-LABEL: global_atomic_dec_noret_i32:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s0
@@ -575,6 +611,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset(ptr addrspace(1) %
; CI-LABEL: global_atomic_dec_noret_i32_offset:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s0, s0, 16
@@ -589,6 +628,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset(ptr addrspace(1) %
; VI-LABEL: global_atomic_dec_noret_i32_offset:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s0, s0, 16
@@ -642,6 +684,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_system(ptr addrspa
; CI-LABEL: global_atomic_dec_noret_i32_offset_system:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s0, s0, 16
@@ -656,6 +701,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_system(ptr addrspa
; VI-LABEL: global_atomic_dec_noret_i32_offset_system:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s0, s0, 16
@@ -710,7 +758,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(ptr addrspace
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
-; CI-NEXT: v_mov_b32_e32 v3, 42
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s2
; CI-NEXT: v_mov_b32_e32 v1, s3
@@ -718,6 +768,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(ptr addrspace
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: v_add_i32_e32 v0, vcc, 20, v0
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; CI-NEXT: v_mov_b32_e32 v3, 42
; CI-NEXT: flat_atomic_dec v3, v[0:1], v3 glc
; CI-NEXT: s_waitcnt vmcnt(0)
; CI-NEXT: buffer_wbinvl1_vol
@@ -732,7 +783,9 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(ptr addrspace
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
-; VI-NEXT: v_mov_b32_e32 v3, 42
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s2
; VI-NEXT: v_mov_b32_e32 v1, s3
@@ -740,6 +793,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(ptr addrspace
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: v_add_u32_e32 v0, vcc, 20, v0
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; VI-NEXT: v_mov_b32_e32 v3, 42
; VI-NEXT: flat_atomic_dec v3, v[0:1], v3 glc
; VI-NEXT: s_waitcnt vmcnt(0)
; VI-NEXT: buffer_wbinvl1_vol
@@ -802,6 +856,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_addr64(ptr addrspa
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s0
; CI-NEXT: v_mov_b32_e32 v1, s1
@@ -819,6 +876,9 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_addr64(ptr addrspa
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx2 s[0:1], s[8:9], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s0
; VI-NEXT: v_mov_b32_e32 v1, s1
@@ -878,6 +938,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32(ptr %out, ptr %ptr) #1 {
; CI-LABEL: flat_atomic_dec_ret_i32:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s2
@@ -893,6 +956,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32(ptr %out, ptr %ptr) #1 {
; VI-LABEL: flat_atomic_dec_ret_i32:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s2
@@ -908,6 +974,8 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32(ptr %out, ptr %ptr) #1 {
; GFX9-LABEL: flat_atomic_dec_ret_i32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX9-NEXT: s_add_u32 flat_scratch_lo, s12, s17
+; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s2
@@ -922,6 +990,10 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32(ptr %out, ptr %ptr) #1 {
;
; GFX10-LABEL: flat_atomic_dec_ret_i32:
; GFX10: ; %bb.0:
+; GFX10-NEXT: s_add_u32 s12, s12, s17
+; GFX10-NEXT: s_addc_u32 s13, s13, 0
+; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
+; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
; GFX10-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
; GFX10-NEXT: v_mov_b32_e32 v2, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -958,6 +1030,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset(ptr %out, ptr %ptr) #1
; CI-LABEL: flat_atomic_dec_ret_i32_offset:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -975,6 +1050,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset(ptr %out, ptr %ptr) #1
; VI-LABEL: flat_atomic_dec_ret_i32_offset:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, s13
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s2, s2, 16
@@ -992,6 +1070,8 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset(ptr %out, ptr %ptr) #1
; GFX9-LABEL: flat_atomic_dec_ret_i32_offset:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX9-NEXT: s_add_u32 flat_scratch_lo, s12, s17
+; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s2
@@ -1006,6 +1086,10 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset(ptr %out, ptr %ptr) #1
;
; GFX10-LABEL: flat_atomic_dec_ret_i32_offset:
; GFX10: ; %bb.0:
+; GFX10-NEXT: s_add_u32 s12, s12, s17
+; GFX10-NEXT: s_addc_u32 s13, s13, 0
+; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
+; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
; GFX10-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
; GFX10-NEXT: v_mov_b32_e32 v2, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -1045,6 +1129,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset_system(ptr %out, ptr %
; CI-LABEL: flat_atomic_dec_ret_i32_offset_system:
; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; CI-NEXT: s_add_i32 s12, s12, s17
+; CI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT: s_mov_b32 flat_scratch_lo, s13
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -1062,6 +1149,9 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset_system(ptr %out, ptr %
; VI-LABEL: flat_atomic_dec_ret_i32_offset_system:
; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[8:9], 0x0
+; VI-NEXT: s_add_i32 s12, s12, s17
+; VI-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT: s_mov_b32 flat_scratch_lo, ...
[truncated]
|
; CI-NEXT: s_waitcnt lgkmcnt(0) | ||
; CI-NEXT: v_mov_b32_e32 v1, s2 | ||
; CI-NEXT: ds_dec_rtn_u32 v2, v1, v0 | ||
; CI-NEXT: s_waitcnt lgkmcnt(0) | ||
; CI-NEXT: v_mov_b32_e32 v0, s0 | ||
; CI-NEXT: s_mov_b32 flat_scratch_lo, s13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the attribute to those test cases to avoid updating these check lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the attribute to those test cases to avoid updating these check lines
Do you mean something like this:
attributes #0 = { nounwind speculatable willreturn memory(none) }
-attributes #1 = { nounwind }
+attributes #1 = { nounwind "amdgpu-no-flat-scratch-init"}
attributes #2 = { nounwind memory(none) }
!0 = !{i32 5, i32 6}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also add a test for what happens on undefined behavior, and flat_scratch does end up really getting used. We shouldn't have any verifier errors or anything
@@ -35,15 +35,15 @@ define amdgpu_kernel void @stack_object_addrspacecast_in_kernel_no_calls() { | |||
; RO-FLAT: scratch_store_dword | |||
; RW-FLAT: .amdhsa_user_sgpr_private_segment_buffer 1 | |||
; RO-FLAT-NOT: .amdhsa_user_sgpr_private_segment_buffer | |||
; RW-FLAT: .amdhsa_user_sgpr_flat_scratch_init 1 | |||
; RW-FLAT: .amdhsa_user_sgpr_flat_scratch_init 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks doubly broken. For the intent of the test, this should be 1. It's also missing the attribute, so it definitely shouldn't be 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previouly the AnnotateKernelFeatures pass checks for calls and allocas. With the change to AMDGPUAttributor, alloca is no longer being checked. For this particular case, opt would set the no-flat-scratch-init
attribute. So after llc, flat_scratch_init in line 38 would be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should fix the TODO above
Please elaborate on "undefined behavior". |
Could you also delete the entry for this pass from |
The contract of amdgpu-no-flat-scratch-init is violated and flat_scr requires initialization. i.e. addrspacecast of a private address to flat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the test changes should just regenerate to use the new output, and not add the attribute. The attribute should only be added in cases where it's directly relevant to the test (like the kernel metadata tests)
@@ -183,15 +177,12 @@ define amdgpu_kernel void @llvm_amdgcn_is_shared(ptr %ptr) { | |||
ret void | |||
} | |||
|
|||
define amdgpu_kernel void @llvm_amdgcn_is_private(ptr %ptr) { | |||
define amdgpu_kernel void @llvm_amdgcn_is_private(ptr %ptr) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is interesting because we could use the attribute to just fold this to a constant false. I think this should not get annotated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want "#0" here removed?
@@ -4022,7 +4019,7 @@ entry: | |||
ret float %ext | |||
} | |||
|
|||
define amdgpu_kernel void @dyn_extract_v4f32_s_s_s(ptr addrspace(1) %out, i32 %sel) { | |||
define amdgpu_kernel void @dyn_extract_v4f32_s_s_s(ptr addrspace(1) %out, i32 %sel) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute isn't relevant enough to this test, I would just regenerate this with the new reference output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,550 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test should be renamed this isn't testing attributor. How about amdgpu-no-flat-scratch-init-undefined-behavior.ll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,550 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a gfx8 run line. It's more relevant than testing gfx10 since it has a different addrspacecast lowering. I would test gfx803, gfx900, and gfx942 for architected flat scratch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -3319,7 +3319,7 @@ define amdgpu_kernel void @atomic_dec_shl_base_lds_0_i64(ptr addrspace(1) %out, | |||
} | |||
|
|||
attributes #0 = { nounwind speculatable willreturn memory(none) } | |||
attributes #1 = { nounwind } | |||
attributes #1 = { nounwind "amdgpu-no-flat-scratch-init"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to this test, just regenerate the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -257,4 +257,4 @@ define amdgpu_kernel void @v_insert_v64i32_varidx(ptr addrspace(1) %out.ptr, ptr | |||
ret void | |||
} | |||
|
|||
attributes #0 = { "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="1,10" } | |||
attributes #0 = { "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="1,10" "amdgpu-no-flat-scratch-init" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a relevant test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
; These functions all should have the attribute amdgpu-no-flat-scratch-init set if the AMDGPUAttributor | ||
; pass is run. Therefore the purpose is to test llc when the attribute is incorrectly missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not running the attributor, only codegen. This is also the opposite of the test I wanted. I wanted to test the case where the functions do have the attribute, but violate the contract. That is, every function should already have the attribute set. And then use the casts and flat accesses which would require flat_scr to be initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not running the attributor, only codegen. This is also the opposite of the test I wanted. I wanted to test the case where the functions do have the attribute, but violate the contract. That is, every function should already have the attribute set. And then use the casts and flat accesses which would require flat_scr to be initialized
Now I have two files, one to run opt (attributor-flatscratchinit-invalid.ll) and one to run llc (attributor-flatscratchinit-invalid2.ll). One question about opt. If a function is not supposed to have the attribute but somehow has it, should attributor remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about opt. If a function is not supposed to have the attribute but somehow has it, should attributor remove it?
@arsenm What's your thought on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not have it at the first place, so the attributor should not remove it and we need to fix where it is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to run the attributor twice, which allowed some attributes to be added after the first run. Some of those are fine as they don't affect correctness. However, if any of the attributes added in that phase impact correctness, then that's a problem.
// The line below: If enableFlatScratch() is true, whether | ||
// no-flat-scratch-init is set is not important. If enableFlatScratch() | ||
// is false, FlatScratchInit cannot be true for graphics CC. | ||
(ST.enableFlatScratch() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The line below: If enableFlatScratch() is true, whether | |
// no-flat-scratch-init is set is not important. If enableFlatScratch() | |
// is false, FlatScratchInit cannot be true for graphics CC. | |
(ST.enableFlatScratch() || | |
// FlatScratchInit cannot be true for graphics CC. | |
(ST.enableFlatScratch() || |
This sounds broken. Flat addressing should work fine in shaders. The only difference will be in the sequence used to initialize flat_scr needs to be different. But that's a problem for later.
Also can sink the hasFnAttribute down here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested. But I'm not sure I understand the comment about hasFnAttribute. It's right above the if stmt where it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean inline in the expression (and swap the order, CC check first). Also "cannot be true", rephrase are something about the graphics shaders not having the input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean inline in the expression (and swap the order, CC check first). Also "cannot be true", rephrase are something about the graphics shaders not having the input
What does "not having the input" mean?
Done. |
I'm not sure. Adding the attribute to existing test cases can avoid unnecessary check line changes, which makes the line history more consistent. |
8ce58d6
to
c34d81a
Compare
I'm fine either way, but now most of the manually added attributes have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but let's wait for @arsenm for the comment about graphics CC.
@@ -0,0 +1,63 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of this test, we don't need to check what the attributor does in these cases. The content is also duplicated with the 2 version. Should drop this version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one runs opt with the Attributor pass. The "2" version runs only llc.
baf06e5
to
046e8a3
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
// FlatScratchInit cannot be true for graphics CC if enableFlatScratch() | ||
// is false. | ||
(ST.enableFlatScratch() || | ||
(!AMDGPU::isGraphics(CC) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking on CC is moved to the front.
if (ST.hasFlatAddressSpace() && AMDGPU::isEntryFunctionCC(CC) && | ||
(IsAmdHsaOrMesa || ST.enableFlatScratch()) && | ||
(HasCalls || HasStackObjects || ST.enableFlatScratch()) && | ||
// FlatScratchInit cannot be true for graphics CC if enableFlatScratch() | ||
// is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is updated.
const bool IsNoFlatScratchInitSet = | ||
F.hasFnAttribute("amdgpu-no-flat-scratch-init"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool IsNoFlatScratchInitSet = | |
F.hasFnAttribute("amdgpu-no-flat-scratch-init"); |
This is now unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR llvm#118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.
As amdgpu-no-flat-scratch-init is set by opt, in most llc tests the attribute is not set. This commit corrects this.
Undo the changes in previous commit. Now the amdgpu-no-flatscratch-init attribute is only manually added if the tests are relevant to the attribute.
Renamed attributor-flatscratchinit-invalid.ll to attributor-flatscratchinit-undefined-behavior.ll. Added RUN lines.
9ee58b7
to
7246b19
Compare
Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR llvm#118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.
Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.