undefined min read
Distributed LLM Training 08 - Tensor Parallel Basics: Splitting Computation Inside the Model
Once the model itself is too large for one device, data parallelism is no longer enough and layer-internal computation has to be split