undefined min read
Distributed LLM Training 11 - Pipeline Parallel Basics and How to Think About Stage Splits
Once the model is split by depth into stages, idle time and stage imbalance become just as important as memory savings