undefined min read
GPU Systems 01 - Roadmap to GPU Kernel Engineering
A practical study order from GPU architecture to CUDA, Triton, and kernel optimization
A practical study order from GPU architecture to CUDA, Triton, and kernel optimization
How Triton fits into real kernel optimization work, especially for LLM-style workloads
Closing the GPU Systems series by connecting profiling, Triton experimentation, and FlashAttention-style thinking