Start Here

This blog is broad enough that jumping in randomly is often the wrong way to read it.

The easier way is to pick a direction first:

stronger systems intuition
ML systems and large-model infrastructure
language and compiler foundations
developer platform and production engineering

If You Want Stronger Systems Intuition

Start with:

Why this path:

These series make the rest of the blog easier to read. They build the kind of mental model that helps with performance, runtime behavior, memory, scheduling, and lower-level debugging.

Suggested order:

If You Want To Study GPU / LLM Infrastructure

Start with:

Why this path:

These three series are meant to fit together. GPU Systems explains how the hardware and kernels behave. PyTorch Internals explains how those kernels meet real training code. Distributed LLM Training explains what happens once the training system spreads across many GPUs and nodes.

Suggested order:

If You Care About ML In Production

Start with:

Why this path:

These series are less about model math and more about operating real systems: pipelines, deployment, observability, self-service, platform constraints, and production tradeoffs.

Suggested order:

MLOps
Platform Engineering
Distributed LLM Training if you want to move toward large-scale training systems

If You Want Language / Compiler Foundations

Start with:

Why this path:

Compiler work becomes much easier to understand when you can connect formal language ideas to runtime and hardware behavior instead of treating them as separate textbooks.

If You Prefer A Gentler Entry Point

Start with:

These two are easier entry points than the heavier systems tracks, while still connecting well to the deeper material later.

A Good General Reading Order

If you want one reasonable long-term path through the blog:

That is not the only order, but it is a coherent one.