GPU Systems
From GPU architecture and CUDA kernels to Triton and real kernel optimization work
Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.
Thoughts on code, technology, and everything in between
Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.
A few strong entry points if you are new here.
Fresh writing, updates, and ongoing series entries.
From GPU architecture and CUDA kernels to Triton and real kernel optimization work
Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.
Building reliable ML systems from data pipelines to production monitoring
ML engineers, data scientists, and backend engineers moving from model experiments to production operations.
From finite automata and formal languages to building a compiler from scratch
Readers who want both the theory behind language processing and the bridge to real compiler construction.
From finite automata and formal languages to building a compiler from scratch
Closing the GPU Systems series by connecting profiling, Triton experimentation, and FlashAttention-style thinking
How asynchronous copy and double buffering help overlap memory movement with computation
How optimized intermediate representations become machine code, and a final look back from automata theory to compiler construction
Why clock speeds stopped increasing and the core concepts of modern multicore processor architecture
How tensor cores change performance in compute-heavy kernels and why mixed precision matters
How namespaces and cgroups create containers, and a wrap-up connecting all kernel concepts
How to bring all MLOps components together into a unified platform
Distributed training architecture is not about collecting fashionable techniques, but about choosing the smallest structure that matches the current bottleneck
When to start a platform team, how to staff it, and how to avoid building an ivory tower nobody uses