undefined min read
Computer Architecture 01 - Overview
Core concepts of Von Neumann architecture and how CPU, memory, and buses work together
Core concepts of Von Neumann architecture and how CPU, memory, and buses work together
Looking only at parameter size leads to bad decisions; training memory is really a combination of parameters, gradients, optimizer state, and activations
Saving memory by recomputing activations is not a minor option; it is often a central design choice in large-scale training
How to think about the GPU memory hierarchy and bandwidth bottlenecks
The concept of virtual memory and how the Linux kernel manages memory
PyTorch GPU memory behavior is shaped by a caching allocator, so observed memory usage is not just a story about current tensor objects