Jae's Tech Blog

January 9, 2026 undefined min read

Distributed LLM Training 02 - The Real Cost of Synchronous SGD and Data Parallelism

Data parallelism looks simple, but it carries both gradient synchronization cost and full model-state replication cost

Lectures

January 12, 2026 undefined min read

To reason about distributed training performance, you need a concrete mental model for all-reduce and collective communication cost

Lectures