March 4, 2026 Distributed LLM Training 20 - A Practical Order for Designing an LLM Training Stack Lectures
March 3, 2026 PyTorch Internals 20 - A Practical Path from Internals Knowledge to Real Engineering Work Lectures
March 1, 2026 Distributed LLM Training 19 - How to Read Megatron-LM and DeepSpeed Structurally Lectures
February 26, 2026 Distributed LLM Training 18 - Deadlocks, Timeouts, and OOMs: Debugging Distributed Training Lectures
February 23, 2026 Distributed LLM Training 17 - Why Checkpointing, Resume, and Fault Tolerance Matter So Much Lectures
February 22, 2026 PyTorch Internals 17 - What Role Triton Plays Inside the PyTorch Ecosystem Lectures
February 16, 2026 PyTorch Internals 15 - Reading Operator Bottlenecks with PyTorch Profiling Lectures
February 11, 2026 Distributed LLM Training 13 - Activation Checkpointing and the Cost of Recomputation Lectures
February 8, 2026 Distributed LLM Training 12 - GPipe, 1F1B, and Interleaving: Choosing a Pipeline Schedule Lectures
February 7, 2026 PyTorch Internals 12 - Backward Implementation Patterns and Saved-State Strategy Lectures
February 5, 2026 Distributed LLM Training 11 - Pipeline Parallel Basics and How to Think About Stage Splits Lectures
February 2, 2026 Distributed LLM Training 10 - Sequence Parallelism and the Cost of Long Context Lectures
February 1, 2026 PyTorch Internals 10 - Connecting a Custom CUDA Kernel Through an Extension Lectures
January 30, 2026 Distributed LLM Training 09 - Where Tensor Parallelism Actually Lives Inside a Transformer Lectures
January 27, 2026 Distributed LLM Training 08 - Tensor Parallel Basics: Splitting Computation Inside the Model Lectures
January 24, 2026 Distributed LLM Training 07 - NCCL and Topology: Why the Same GPU Count Can Behave Very Differently Lectures
January 23, 2026 PyTorch Internals 07 - Tensor Lifetime, the CUDA Caching Allocator, and Memory Reuse Lectures
January 18, 2026 Distributed LLM Training 05 - Global Batch Size, Gradient Accumulation, and Learning Rate Scaling Lectures
January 14, 2026 PyTorch Internals 04 - What the Dispatcher and Operator Registry Actually Do Lectures
January 12, 2026 Distributed LLM Training 03 - All-Reduce, Ring, and How to Read Communication Cost Lectures
January 9, 2026 Distributed LLM Training 02 - The Real Cost of Synchronous SGD and Data Parallelism Lectures
January 6, 2026 Distributed LLM Training 01 - Why LLM Training Becomes a Distributed Systems Problem Lectures