Jae's Tech Blog
Home Archive About Game

Lectures

All posts in the Lectures

February 3, 2026 undefined min read

GPU Systems 03 - Memory Hierarchy and Bandwidth

How to think about the GPU memory hierarchy and bandwidth bottlenecks

Lectures
Read more
February 5, 2026 undefined min read

GPU Systems 04 - Writing CUDA Kernels and Choosing Launch Configuration

How to think about indexing and launch configuration when writing CUDA kernels

Lectures
Read more
February 7, 2026 undefined min read

GPU Systems 05 - Coalescing, Shared Memory, and Reduction Patterns

The optimization patterns that keep showing up in CUDA kernels

Lectures
Read more
February 9, 2026 undefined min read

GPU Systems 06 - Triton and the Practical Shape of Kernel Optimization

How Triton fits into real kernel optimization work, especially for LLM-style workloads

Lectures
Read more
February 11, 2026 undefined min read

GPU Systems 07 - Occupancy and Latency Hiding

Understanding occupancy as a latency-hiding concept instead of just a percentage

Lectures
Read more
February 13, 2026 undefined min read

GPU Systems 08 - Profiling and the Roofline View

A practical way to use profiling and roofline thinking to understand kernel bottlenecks

Lectures
Read more
February 15, 2026 undefined min read

GPU Systems 09 - Why Naive Matrix Multiplication Is Slow

Using naive matrix multiplication to see memory reuse and traffic problems clearly

Lectures
Read more
February 17, 2026 undefined min read

GPU Systems 10 - Tiled Matrix Multiplication and Shared Memory

Why tiled matrix multiplication and shared memory create such a big performance difference

Lectures
Read more
February 19, 2026 undefined min read

GPU Systems 11 - Shared Memory Bank Conflicts

Why shared memory is not automatically fast and how bank conflicts appear

Lectures
Read more
โ† Previous
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS