PyTorch Internals 03 - Contiguous Layout, Memory Format, and Hidden Copies
Layout affects both operator selection and performance, and sometimes the most expensive thing in a path is an invisible copy
Why contiguous appears so often
Many PyTorch operators can accept non-contiguous tensors, but they do not all handle them the same way.
- some walk through the tensor using stride
- some make an internal contiguous copy
- some custom kernels assume contiguous input outright
That means layout affects memory use and performance, not just correctness.
Why hidden copies are dangerous
An invisible layout conversion can:
- raise memory usage unexpectedly
- take more time than the operator you intended to measure
- make profiling confusing
The next post moves into the dispatcher, the layer that decides which implementation path an operator call actually takes.