-
Notifications
You must be signed in to change notification settings - Fork 4
geohot/gpunoob
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
// dims = G*L (g*num_threads + l) // OpenCL CUDA HIP // G cores (get_group_id, blockIdx, __ockl_get_group_id, threadgroup_position_in_grid) // L threads (get_local_id, threadIdx, __ockl_get_local_id, thread_position_in_threadgroup) // 128 cores, 1 thread each // 64 cores, 2 threads each // GPUs have warps. Warps are groups of threads, and all modern GPUs have them as 32 threads. // GPUs are multicore processors with 32 threads // On NVIDIA, cores are streaming multiprocessors. // AD102 (4090) has 144 SMs with 128 threads each // On AMD, cores are compute units. // 7900XTX has 96 CUs with with 64 threads each // On Apple, ??? // M3 Max has a 40 core GPU, 640 "Execution Units", 5120 "ALUs" // Measured. 640 EUs with 32 threads each (20480 threads total) // SIMD - Single Instruction Multiple Data // vector registers // float<32> (1024 bits) // c = a + b (on vector registers, this is a single add instruction on 32 pieces of data) // SIMT - Single Instruction Multiple Thread // similar to SIMD, but load/stores are different // you only declare "float", but behind the scenes it's float<32> // load stores are implicit scatter gather, whereas on SIMD it's explicit
About
Noob Lessons from Stream about how GPUs work
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published