Jae's Tech Blog

January 30, 2026 undefined min read

Distributed LLM Training 09 - Where Tensor Parallelism Actually Lives Inside a Transformer

Tensor parallelism becomes real when you map it onto QKV projections, attention output paths, and the two large MLP projections inside a transformer block

Lectures

Posts tagged "attention"

Distributed LLM Training 09 - Where Tensor Parallelism Actually Lives Inside a Transformer