Why contiguous appears so often

Many PyTorch operators can accept non-contiguous tensors, but they do not all handle them the same way.

  • some walk through the tensor using stride
  • some make an internal contiguous copy
  • some custom kernels assume contiguous input outright

That means layout affects memory use and performance, not just correctness.

Why hidden copies are dangerous

An invisible layout conversion can:

  • raise memory usage unexpectedly
  • take more time than the operator you intended to measure
  • make profiling confusing

The next post moves into the dispatcher, the layer that decides which implementation path an operator call actually takes.