undefined min read
Distributed LLM Training 15 - How FSDP Differs from DDP and When It Helps
FSDP keeps parameters sharded and only gathers them when needed, making it a direct answer to parameter-replication pressure
FSDP keeps parameters sharded and only gathers them when needed, making it a direct answer to parameter-replication pressure
Frameworks are easier to understand when you read them as bundles of parallelization and state-management choices rather than as giant feature lists
DDP and FSDP are not external magic; they depend directly on autograd timing and tensor-state management inside the runtime