PyTorch Internals 07 - Tensor Lifetime, the CUDA Caching Allocator, and Memory Reuse
PyTorch GPU memory behavior is shaped by a caching allocator, so observed memory usage is not just a story about current tensor objects
Why memory behavior feels non-intuitive
Deleting a tensor does not necessarily mean GPU memory immediately returns to the system. PyTorch uses a caching allocator to reuse device allocations efficiently.
That helps performance, but it also means:
- reserved memory and active memory differ
- fragmentation can matter
- shape patterns can create unexpectedly high peaks
The next post connects this to stream semantics, because memory and execution timing are closely related on GPU.