Jae's Tech Blog
Home Archive About Game

Lectures

All posts in the Lectures

February 8, 2026 undefined min read

Computer Architecture 07 - Memory Hierarchy

The memory hierarchy from registers to HDD and how caches work

Lectures
Read more
February 17, 2026 undefined min read

Computer Architecture 08 - Virtual Memory and MMU

How virtual memory enables process isolation through the MMU, page tables, and TLB

Lectures
Read more
February 24, 2026 undefined min read

Computer Architecture 09 - I/O and DMA

How the CPU exchanges data with external devices and the principles behind efficient data transfer via DMA

Lectures
Read more
March 5, 2026 undefined min read

Computer Architecture 10 - Multicore and Modern Processors

Why clock speeds stopped increasing and the core concepts of modern multicore processor architecture

Lectures
Read more
January 6, 2026 undefined min read

Distributed LLM Training 01 - Why LLM Training Becomes a Distributed Systems Problem

Once LLM training leaves a single GPU, it stops being only a modeling problem and becomes a systems problem around memory, communication, and recovery

Lectures
Read more
January 9, 2026 undefined min read

Distributed LLM Training 02 - The Real Cost of Synchronous SGD and Data Parallelism

Data parallelism looks simple, but it carries both gradient synchronization cost and full model-state replication cost

Lectures
Read more
January 12, 2026 undefined min read

Distributed LLM Training 03 - All-Reduce, Ring, and How to Read Communication Cost

To reason about distributed training performance, you need a concrete mental model for all-reduce and collective communication cost

Lectures
Read more
January 15, 2026 undefined min read

Distributed LLM Training 04 - What PyTorch DDP Actually Does Internally

DDP is not just a wrapper around your model; it is a runtime that coordinates autograd hooks, gradient buckets, and synchronization timing

Lectures
Read more
January 18, 2026 undefined min read

Distributed LLM Training 05 - Global Batch Size, Gradient Accumulation, and Learning Rate Scaling

Adding more GPUs changes optimizer semantics as well as throughput, so batch size and learning rate need to be reasoned about together

Lectures
Read more
โ† Previous
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS