undefined min read
GPU Systems 00 - What You Should Know Before Starting This Series
The background knowledge that makes the GPU Systems series much easier to study properly
The background knowledge that makes the GPU Systems series much easier to study properly
A practical study order from GPU architecture to CUDA, Triton, and kernel optimization
What threads, warps, blocks, and grids mean in actual GPU execution
How to think about the GPU memory hierarchy and bandwidth bottlenecks
How to think about indexing and launch configuration when writing CUDA kernels
How Triton fits into real kernel optimization work, especially for LLM-style workloads
Understanding occupancy as a latency-hiding concept instead of just a percentage
A practical way to use profiling and roofline thinking to understand kernel bottlenecks
Using naive matrix multiplication to see memory reuse and traffic problems clearly