Jae's Tech Blog
Home Archive About Game

Posts tagged "distributed-training"

March 1, 2026 undefined min read

Distributed LLM Training 19 - How to Read Megatron-LM and DeepSpeed Structurally

Frameworks are easier to understand when you read them as bundles of parallelization and state-management choices rather than as giant feature lists

Lectures
Read more
March 4, 2026 undefined min read

Distributed LLM Training 20 - A Practical Order for Designing an LLM Training Stack

Distributed training architecture is not about collecting fashionable techniques, but about choosing the smallest structure that matches the current bottleneck

Lectures
Read more
โ† Previous
1 2 3
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS