Jae's Tech Blog

February 7, 2026 undefined min read

GPU Systems 05 - Coalescing, Shared Memory, and Reduction Patterns

The optimization patterns that keep showing up in CUDA kernels

Lectures

February 21, 2026 undefined min read

Why warp-level primitives matter for reductions and lighter-weight cooperation

Lectures

February 23, 2026 undefined min read

Using reduction kernels to connect shared memory, warp primitives, and synchronization

Lectures

February 25, 2026 undefined min read

How softmax combines reductions, memory traffic, and numerical stability in one kernel

Lectures