undefined min read
Distributed LLM Training 04 - What PyTorch DDP Actually Does Internally
DDP is not just a wrapper around your model; it is a runtime that coordinates autograd hooks, gradient buckets, and synchronization timing