Skip to content

geohot/gpunoob

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

// dims = G*L (g*num_threads + l)
//            OpenCL        CUDA       HIP
// G cores   (get_group_id, blockIdx,  __ockl_get_group_id, threadgroup_position_in_grid)
// L threads (get_local_id, threadIdx, __ockl_get_local_id, thread_position_in_threadgroup)

// 128 cores, 1 thread each
// 64 cores, 2 threads each

// GPUs have warps. Warps are groups of threads, and all modern GPUs have them as 32 threads.
// GPUs are multicore processors with 32 threads

// On NVIDIA, cores are streaming multiprocessors.
//   AD102 (4090) has 144 SMs with 128 threads each
// On AMD, cores are compute units.
//   7900XTX has 96 CUs with with 64 threads each
// On Apple, ???
//   M3 Max has a 40 core GPU, 640 "Execution Units", 5120 "ALUs"
//   Measured. 640 EUs with 32 threads each (20480 threads total)

// SIMD - Single Instruction Multiple Data
//    vector registers
//    float<32> (1024 bits)
//    c = a + b (on vector registers, this is a single add instruction on 32 pieces of data)

// SIMT - Single Instruction Multiple Thread
//    similar to SIMD, but load/stores are different
//    you only declare "float", but behind the scenes it's float<32>
//    load stores are implicit scatter gather, whereas on SIMD it's explicit

About

Noob Lessons from Stream about how GPUs work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages