This page is for readers who do not want the full roadmap first.
If the Start Here page is the structured entry, this page is the curated one. These are the posts and series that best represent what the blog is trying to do well.
Best Series To Read Straight Through
GPU Systems
One of the strongest long-form tracks on the blog right now. It connects GPU architecture, CUDA thinking, Triton, and optimization work into one technical arc instead of treating them as disconnected topics.
Best if you want:
- GPU kernel engineering
- better hardware-level performance intuition
- a stronger bridge into large-model training systems
PyTorch Internals
A good bridge series between model code and systems work. It is useful if you already use PyTorch but want to understand tensors, autograd, extensions, and how custom kernels fit into real training code.
Distributed LLM Training
This is the systems-heavy LLM training track. It is less about API usage and more about understanding how memory, communication, topology, and framework structure interact.
Compiler
A good theory-to-implementation series. It is strongest when read as a way to connect formal language ideas to ASTs, IR, optimization, and code generation.
Best Individual Starting Points
GPU Systems 00 - What You Need Before Studying GPU Systems
This is a strong entry point if you want to study the GPU track seriously instead of sampling random posts.
Distributed LLM Training 01 - Why LLM Training Becomes a Distributed Systems Problem
This is one of the better “why this topic matters” posts on the blog. It sets the framing correctly before the framework names start appearing.
PyTorch Internals 01 - Why You Need to Understand the Internals
A useful bridge post if you are already working with models and want to move closer to kernels, operators, and runtime behavior.
Automata and Compilers 01 - Finite Automata
Good if you want to start the compiler track from the actual theoretical base instead of jumping into parser implementation without context.
Linux 01 - What the Kernel Is Actually Doing
Good if you want stronger systems intuition from the operating-system side first.
Best Paths By Goal
If you want to become stronger at systems work
Read:
If you want the GPU / LLM stack
Read:
If you want production-facing ML systems
Read:
If You Want One Recommendation
If you want the most representative current path on the blog, start with GPU Systems, then move to PyTorch Internals, then Distributed LLM Training.
That path captures the blog at its strongest: systems thinking, runtime detail, and serious technical sequencing.