GitHub - geohot/gpunoob: Noob Lessons from Stream about how GPUs work

geohot / gpunoob Public

Notifications You must be signed in to change notification settings
Fork 4
Star 100

Noob Lessons from Stream about how GPUs work

100 stars 4 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README		README

Repository files navigation

// dims = G*L (g*num_threads + l)
//            OpenCL        CUDA       HIP
// G cores   (get_group_id, blockIdx,  __ockl_get_group_id, threadgroup_position_in_grid)
// L threads (get_local_id, threadIdx, __ockl_get_local_id, thread_position_in_threadgroup)

// 128 cores, 1 thread each
// 64 cores, 2 threads each

// GPUs have warps. Warps are groups of threads, and all modern GPUs have them as 32 threads.
// GPUs are multicore processors with 32 threads

// On NVIDIA, cores are streaming multiprocessors.
//   AD102 (4090) has 144 SMs with 128 threads each
// On AMD, cores are compute units.
//   7900XTX has 96 CUs with with 64 threads each
// On Apple, ???
//   M3 Max has a 40 core GPU, 640 "Execution Units", 5120 "ALUs"
//   Measured. 640 EUs with 32 threads each (20480 threads total)

// SIMD - Single Instruction Multiple Data
//    vector registers
//    float<32> (1024 bits)
//    c = a + b (on vector registers, this is a single add instruction on 32 pieces of data)

// SIMT - Single Instruction Multiple Thread
//    similar to SIMD, but load/stores are different
//    you only declare "float", but behind the scenes it's float<32>
//    load stores are implicit scatter gather, whereas on SIMD it's explicit