undefined min read
Distributed LLM Training 10 - Sequence Parallelism and the Cost of Long Context
As context length grows, activation memory and communication patterns change again, and sequence-oriented partitioning starts to matter