Jae's Tech Blog
Home Archive About Game

Lectures

All posts in the Lectures

February 17, 2026 undefined min read

Distributed LLM Training 15 - How FSDP Differs from DDP and When It Helps

FSDP keeps parameters sharded and only gathers them when needed, making it a direct answer to parameter-replication pressure

Lectures
Read more
February 20, 2026 undefined min read

Distributed LLM Training 16 - How Communication Overlap Hides Step Time

The goal of overlap is not to eliminate communication entirely, but to make it finish underneath useful computation

Lectures
Read more
February 23, 2026 undefined min read

Distributed LLM Training 17 - Why Checkpointing, Resume, and Fault Tolerance Matter So Much

In long distributed runs, reliable recovery is as important as raw throughput

Lectures
Read more
February 26, 2026 undefined min read

Distributed LLM Training 18 - Deadlocks, Timeouts, and OOMs: Debugging Distributed Training

Debugging distributed training is about narrowing down which rank, which collective, and which state transition went wrong

Lectures
Read more
March 1, 2026 undefined min read

Distributed LLM Training 19 - How to Read Megatron-LM and DeepSpeed Structurally

Frameworks are easier to understand when you read them as bundles of parallelization and state-management choices rather than as giant feature lists

Lectures
Read more
March 4, 2026 undefined min read

Distributed LLM Training 20 - A Practical Order for Designing an LLM Training Stack

Distributed training architecture is not about collecting fashionable techniques, but about choosing the smallest structure that matches the current bottleneck

Lectures
Read more
January 28, 2026 undefined min read

GPU Systems 00 - What You Should Know Before Starting This Series

The background knowledge that makes the GPU Systems series much easier to study properly

Lectures
Read more
January 30, 2026 undefined min read

GPU Systems 01 - Roadmap to GPU Kernel Engineering

A practical study order from GPU architecture to CUDA, Triton, and kernel optimization

Lectures
Read more
February 1, 2026 undefined min read

GPU Systems 02 - The Thread, Warp, and Block Execution Model

What threads, warps, blocks, and grids mean in actual GPU execution

Lectures
Read more
โ† Previous
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS