Backward is a runtime process

loss.backward() looks simple from the outside, but internally it relies on:

  • graph construction during forward
  • saved tensors and metadata
  • dependency-aware backward scheduling

That is why autograd is best thought of as an execution engine, not just a differentiation feature.

Why this matters

You need this model to understand:

  • why gradients disappear
  • why in-place operations are dangerous
  • why some custom operators need explicit backward logic
  • why memory usage can rise due to saved activations

The next post shows where custom autograd functions fit into this picture.