undefined min read
Distributed LLM Training 13 - Activation Checkpointing and the Cost of Recomputation
Saving memory by recomputing activations is not a minor option; it is often a central design choice in large-scale training