Jae's Tech Blog

January 28, 2026 undefined min read

GPU Systems 00 - What You Should Know Before Starting This Series

The background knowledge that makes the GPU Systems series much easier to study properly

Lectures

January 30, 2026 undefined min read

A practical study order from GPU architecture to CUDA, Triton, and kernel optimization

Lectures

February 1, 2026 undefined min read

What threads, warps, blocks, and grids mean in actual GPU execution

Lectures

February 3, 2026 undefined min read

How to think about the GPU memory hierarchy and bandwidth bottlenecks

Lectures

February 5, 2026 undefined min read

How to think about indexing and launch configuration when writing CUDA kernels

Lectures

February 7, 2026 undefined min read

The optimization patterns that keep showing up in CUDA kernels

Lectures

February 11, 2026 undefined min read

Understanding occupancy as a latency-hiding concept instead of just a percentage

Lectures

February 15, 2026 undefined min read

Using naive matrix multiplication to see memory reuse and traffic problems clearly

Lectures

February 17, 2026 undefined min read

Why tiled matrix multiplication and shared memory create such a big performance difference

Lectures