Why memory behavior feels non-intuitive

Deleting a tensor does not necessarily mean GPU memory immediately returns to the system. PyTorch uses a caching allocator to reuse device allocations efficiently.

That helps performance, but it also means:

  • reserved memory and active memory differ
  • fragmentation can matter
  • shape patterns can create unexpectedly high peaks

The next post connects this to stream semantics, because memory and execution timing are closely related on GPU.