Jae's Tech Blog
Home Archive About Game

Posts tagged "activation"

January 21, 2026 undefined min read

Distributed LLM Training 06 - Where LLM Training Memory Actually Goes

Looking only at parameter size leads to bad decisions; training memory is really a combination of parameters, gradients, optimizer state, and activations

Lectures
Read more
February 2, 2026 undefined min read

Distributed LLM Training 10 - Sequence Parallelism and the Cost of Long Context

As context length grows, activation memory and communication patterns change again, and sequence-oriented partitioning starts to matter

Lectures
Read more

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS