Jae's Tech Blog

Technical notes by Jae

Systems writing for engineers who want the deeper model.

Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.

Start here Featured series

A few strong entry points if you are new here.

Browse recent posts 119 posts

Fresh writing, updates, and ongoing series entries.

21 posts 58 min total

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.

Recommended Start

10 posts 43 min total

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

ML engineers, data scientists, and backend engineers moving from model experiments to production operations.

Recommended Start

12 posts 60 min total

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

Readers who want both the theory behind language processing and the bridge to real compiler construction.

Recommended Start

UPDATED

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

21 posts

Open series guide →

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

10 posts

Open series guide →

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

12 posts

Open series guide →

Linux Internals

Understanding the Linux kernel from processes and memory to containers

10 posts

Open series guide →

Computer Architecture

From CPU internals and privilege levels to memory hierarchy and modern multicore processors

10 posts

Open series guide →

Platform Engineering Fundamentals

Understanding the principles behind Internal Developer Platforms, golden paths, and developer self-service

11 posts

Open series guide →

Distributed LLM Training

From data parallelism to tensor parallelism, FSDP, ZeRO, and modern LLM training frameworks

20 posts

Open series guide →

PyTorch Internals

Understanding tensors, autograd, and CUDA extensions well enough to connect custom kernels to real training code

20 posts

Open series guide →

Python Lecture Series

Comprehensive Python programming course from basics to advanced topics

5 posts

Open series guide →

January 21, 2026 undefined min read

Automata and Compilers 06 - Lexical Analysis

How a lexer breaks source code into tokens and where automata theory meets real implementation

Lectures

January 21, 2026 undefined min read

Distributed LLM Training 06 - Where LLM Training Memory Actually Goes

Looking only at parameter size leads to bad decisions; training memory is really a combination of parameters, gradients, optimizer state, and activations

Lectures

January 20, 2026 undefined min read

PyTorch Internals 06 - When and How to Use a Custom Autograd Function

Custom autograd functions are a practical place to define forward-backward contracts before dropping to lower-level extensions

Lectures

January 19, 2026 undefined min read

Linux Internals 04 - Memory Management

The concept of virtual memory and how the Linux kernel manages memory

Lectures

January 18, 2026 undefined min read

Computer Architecture 04 - Pipelining and Parallel Processing

Instruction pipelining, hazard handling, branch prediction, superscalar and out-of-order execution

Lectures

January 18, 2026 undefined min read

Distributed LLM Training 05 - Global Batch Size, Gradient Accumulation, and Learning Rate Scaling

Adding more GPUs changes optimizer semantics as well as throughput, so batch size and learning rate need to be reasoned about together

Lectures

January 17, 2026 undefined min read

Platform Engineering 04 - Designing Golden Paths

How to design golden paths that developers actually want to follow — principles, step-by-step process, and handling teams that go off-road

Lectures

January 17, 2026 undefined min read

PyTorch Internals 05 - How the Autograd Graph and Engine Work

Autograd is not just automatic differentiation; it is a graph-construction and backward-execution runtime

Lectures

January 15, 2026 undefined min read

Distributed LLM Training 04 - What PyTorch DDP Actually Does Internally

DDP is not just a wrapper around your model; it is a runtime that coordinates autograd hooks, gradient buckets, and synchronization timing

Lectures