Jae's Tech Blog
Home Archive About Game

Posts tagged "optimizer-state"

January 21, 2026 undefined min read

Distributed LLM Training 06 - Where LLM Training Memory Actually Goes

Looking only at parameter size leads to bad decisions; training memory is really a combination of parameters, gradients, optimizer state, and activations

Lectures
Read more

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS