Jae's Tech Blog

January 28, 2026 undefined min read

GPU Systems 00 - What You Should Know Before Starting This Series

The background knowledge that makes the GPU Systems series much easier to study properly

Lectures

January 30, 2026 undefined min read

A practical study order from GPU architecture to CUDA, Triton, and kernel optimization

Lectures

February 1, 2026 undefined min read

What threads, warps, blocks, and grids mean in actual GPU execution

Lectures

February 3, 2026 undefined min read

How to think about the GPU memory hierarchy and bandwidth bottlenecks

Lectures

February 5, 2026 undefined min read

How to think about indexing and launch configuration when writing CUDA kernels

Lectures

February 9, 2026 undefined min read

How Triton fits into real kernel optimization work, especially for LLM-style workloads

Lectures

February 11, 2026 undefined min read

Understanding occupancy as a latency-hiding concept instead of just a percentage

Lectures

February 13, 2026 undefined min read

A practical way to use profiling and roofline thinking to understand kernel bottlenecks

Lectures

February 15, 2026 undefined min read

Using naive matrix multiplication to see memory reuse and traffic problems clearly

Lectures