Jae's Tech Blog
Home Archive About Game

Posts tagged "deepspeed"

February 14, 2026 undefined min read

Distributed LLM Training 14 - What ZeRO Stage 1, 2, and 3 Each Remove

ZeRO is best understood as a staged system for removing different forms of replicated training state

Lectures
Read more
March 1, 2026 undefined min read

Distributed LLM Training 19 - How to Read Megatron-LM and DeepSpeed Structurally

Frameworks are easier to understand when you read them as bundles of parallelization and state-management choices rather than as giant feature lists

Lectures
Read more

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS