Jae's Tech Blog

Technical notes by Jae

Systems writing for engineers who want the deeper model.

Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.

Start here Featured series

A few strong entry points if you are new here.

Browse recent posts 119 posts

Fresh writing, updates, and ongoing series entries.

21 posts 58 min total

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.

Recommended Start

10 posts 43 min total

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

ML engineers, data scientists, and backend engineers moving from model experiments to production operations.

Recommended Start

12 posts 60 min total

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

Readers who want both the theory behind language processing and the bridge to real compiler construction.

Recommended Start

UPDATED

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

21 posts

Open series guide →

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

10 posts

Open series guide →

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

12 posts

Open series guide →

Linux Internals

Understanding the Linux kernel from processes and memory to containers

10 posts

Open series guide →

Computer Architecture

From CPU internals and privilege levels to memory hierarchy and modern multicore processors

10 posts

Open series guide →

Platform Engineering Fundamentals

Understanding the principles behind Internal Developer Platforms, golden paths, and developer self-service

11 posts

Open series guide →

Distributed LLM Training

From data parallelism to tensor parallelism, FSDP, ZeRO, and modern LLM training frameworks

20 posts

Open series guide →

PyTorch Internals

Understanding tensors, autograd, and CUDA extensions well enough to connect custom kernels to real training code

20 posts

Open series guide →

Python Lecture Series

Comprehensive Python programming course from basics to advanced topics

5 posts

Open series guide →

January 14, 2026 undefined min read

MLOps 03 - Experiment Tracking and Training Management

What goes wrong when experiments aren't tracked, and the tools that solve it

Lectures

January 14, 2026 undefined min read

PyTorch Internals 04 - What the Dispatcher and Operator Registry Actually Do

A single operator name in PyTorch may map to many implementations, and the dispatcher is the runtime layer that decides which one runs

Lectures

January 13, 2026 undefined min read

Automata and Compilers 05 - Compiler Overview — Phases and Architecture

Why compilers are divided into multiple phases and what each phase does

Lectures

January 12, 2026 undefined min read

Distributed LLM Training 03 - All-Reduce, Ring, and How to Read Communication Cost

To reason about distributed training performance, you need a concrete mental model for all-reduce and collective communication cost

Lectures

January 12, 2026 undefined min read

Linux Internals 03 - Process Scheduling

How the Linux kernel distributes CPU time among processes and how CFS works

Lectures

January 11, 2026 undefined min read

PyTorch Internals 03 - Contiguous Layout, Memory Format, and Hidden Copies

Layout affects both operator selection and performance, and sometimes the most expensive thing in a path is an invisible copy

Lectures

January 10, 2026 undefined min read

Computer Architecture 03 - Instruction Set Architecture (ISA)

The role of ISAs, CISC vs RISC philosophy, and x86 versus ARM design differences

Lectures

January 9, 2026 undefined min read

Distributed LLM Training 02 - The Real Cost of Synchronous SGD and Data Parallelism

Data parallelism looks simple, but it carries both gradient synchronization cost and full model-state replication cost

Lectures

January 9, 2026 undefined min read

MLOps 02 - Data Pipelines and Feature Engineering

How raw data becomes training data, and why data quality matters more than model complexity

Lectures