Jae's Tech Blog
Home Start Here Best Of Archive About Game

Posts tagged "gpu"

February 17, 2026 undefined min read

GPU Systems 10 - Tiled Matrix Multiplication and Shared Memory

Why tiled matrix multiplication and shared memory create such a big performance difference

Lectures
Read more
February 19, 2026 undefined min read

GPU Systems 11 - Shared Memory Bank Conflicts

Why shared memory is not automatically fast and how bank conflicts appear

Lectures
Read more
February 21, 2026 undefined min read

GPU Systems 12 - Warp Shuffle and Warp-Level Primitives

Why warp-level primitives matter for reductions and lighter-weight cooperation

Lectures
Read more
February 23, 2026 undefined min read

GPU Systems 13 - Reduction Kernels in Depth

Using reduction kernels to connect shared memory, warp primitives, and synchronization

Lectures
Read more
February 25, 2026 undefined min read

GPU Systems 14 - Why Softmax Is Such a Good Kernel Exercise

How softmax combines reductions, memory traffic, and numerical stability in one kernel

Lectures
Read more
February 27, 2026 undefined min read

GPU Systems 15 - LayerNorm and RMSNorm Kernel Structure

Why normalization kernels are often memory-bound and structurally important

Lectures
Read more
March 1, 2026 undefined min read

GPU Systems 16 - Vectorized Loads, Stores, and Alignment

How wider memory operations and alignment affect bandwidth utilization

Lectures
Read more
March 3, 2026 undefined min read

GPU Systems 17 - Register Pressure and Spilling

Why using more registers can improve local efficiency but still reduce total throughput

Lectures
Read more
March 5, 2026 undefined min read

GPU Systems 18 - Tensor Cores and Mixed Precision

How tensor cores change performance in compute-heavy kernels and why mixed precision matters

Lectures
Read more
โ† Previous
1 2 3
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS