Jae's Tech Blog
Home Archive About Game

Posts tagged "all-reduce"

January 9, 2026 undefined min read

Distributed LLM Training 02 - The Real Cost of Synchronous SGD and Data Parallelism

Data parallelism looks simple, but it carries both gradient synchronization cost and full model-state replication cost

Lectures
Read more
January 12, 2026 undefined min read

Distributed LLM Training 03 - All-Reduce, Ring, and How to Read Communication Cost

To reason about distributed training performance, you need a concrete mental model for all-reduce and collective communication cost

Lectures
Read more

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS