undefined min read
GPU Systems 01 - Roadmap to GPU Kernel Engineering
A practical study order from GPU architecture to CUDA, Triton, and kernel optimization
A practical study order from GPU architecture to CUDA, Triton, and kernel optimization
How Triton fits into real kernel optimization work, especially for LLM-style workloads
Closing the GPU Systems series by connecting profiling, Triton experimentation, and FlashAttention-style thinking
How to serve trained models in production and deploy them safely
Triton is not just a convenient kernel language; it is part of the modern PyTorch kernel and compilation story