January 23, 2026

PyTorch Internals 07 - Tensor Lifetime, the CUDA Caching Allocator, and Memory Reuse

PyTorch GPU memory behavior is shaped by a caching allocator, so observed memory usage is not just a story about current tensor objects

Read:

1 min read

Series:

📚 PyTorch Internals (7/20)

Category:

Lectures

Tags:

Why memory behavior feels non-intuitive

Deleting a tensor does not necessarily mean GPU memory immediately returns to the system. PyTorch uses a caching allocator to reuse device allocations efficiently.

That helps performance, but it also means:

reserved memory and active memory differ
fragmentation can matter
shape patterns can create unexpectedly high peaks

The next post connects this to stream semantics, because memory and execution timing are closely related on GPU.

Why memory behavior feels non-intuitive

Continue Reading

PyTorch Internals 08 - CUDA Streams, Events, and Asynchronous Execution

PyTorch Internals 09 - The Basic Path of a C++ Extension

PyTorch Internals 10 - Connecting a Custom CUDA Kernel Through an Extension