PyTorch Internals 13 - When a Fused Operator Is Actually Worth It
Fusion is valuable when it reduces memory traffic and intermediate materialization, not just when it reduces the number of visible ops
Why fuse at all
Fused operators usually aim to reduce:
- kernel launch overhead
- intermediate tensor materialization
- unnecessary global memory traffic
So the true benefit is often memory-system efficiency rather than just operator count reduction.
The next post looks at AMP and numerical stability, because a fast operator that is unstable in mixed precision is not practically useful.