재영의 기술 블로그

January 6, 2026 undefined분 읽기

분산 LLM 학습 01 - 왜 LLM 학습은 분산 시스템 문제가 되는가

여러 GPU를 붙이는 순간 학습 코드는 계산만의 문제가 아니라 메모리와 통신, 장애 복구까지 포함한 시스템 문제가 된다

Lectures

January 9, 2026 undefined분 읽기

가장 기본적인 분산 학습 방식인 data parallel은 단순해 보이지만 gradient 동기화와 메모리 복제 비용을 함께 안고 있다

Lectures

January 12, 2026 undefined분 읽기

분산 학습에서 가장 자주 등장하는 collective인 all-reduce를 이해해야 gradient synchronization 비용을 제대로 읽을 수 있다

Lectures

January 15, 2026 undefined분 읽기

DDP는 단순 래퍼가 아니라 autograd hook, gradient bucket, process group을 사용해 동기화를 조직하는 런타임이다

Lectures

January 18, 2026 undefined분 읽기

GPU 수를 늘리는 일은 단순한 throughput 증가가 아니라 optimizer가 보는 batch 의미를 바꾸는 일이다

Lectures

January 21, 2026 undefined분 읽기

파라미터만 보는 순간 분산 학습 판단을 잘못하게 된다. activation, gradient, optimizer state를 함께 봐야 한다

Lectures

January 24, 2026 undefined분 읽기

분산 학습 성능은 GPU 개수보다 GPU들이 어떤 링크로 연결되어 있는지에 더 크게 흔들릴 때가 많다

Lectures

January 27, 2026 undefined분 읽기

모델이 한 GPU에 안 들어가기 시작하면 더 이상 데이터만 나누는 것으로는 부족하고 연산 자체를 분할해야 한다

Lectures

January 30, 2026 undefined분 읽기

tensor parallel은 추상 개념이 아니라 attention projection, output projection, MLP 같은 구체적인 지점에 들어간다

Lectures