Jae's Tech Blog

Technical notes by Jae

Systems writing for engineers who want the deeper model.

Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.

Start here Featured series

A few strong entry points if you are new here.

Browse recent posts 119 posts

Fresh writing, updates, and ongoing series entries.

21 posts 58 min total

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.

Recommended Start

10 posts 43 min total

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

ML engineers, data scientists, and backend engineers moving from model experiments to production operations.

Recommended Start

12 posts 60 min total

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

Readers who want both the theory behind language processing and the bridge to real compiler construction.

Recommended Start

UPDATED

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

21 posts

Open series guide →

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

10 posts

Open series guide →

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

12 posts

Open series guide →

Linux Internals

Understanding the Linux kernel from processes and memory to containers

10 posts

Open series guide →

Computer Architecture

From CPU internals and privilege levels to memory hierarchy and modern multicore processors

10 posts

Open series guide →

Platform Engineering Fundamentals

Understanding the principles behind Internal Developer Platforms, golden paths, and developer self-service

11 posts

Open series guide →

Distributed LLM Training

From data parallelism to tensor parallelism, FSDP, ZeRO, and modern LLM training frameworks

20 posts

Open series guide →

PyTorch Internals

Understanding tensors, autograd, and CUDA extensions well enough to connect custom kernels to real training code

20 posts

Open series guide →

Python Lecture Series

Comprehensive Python programming course from basics to advanced topics

5 posts

Open series guide →

NEW

March 9, 2026 undefined min read

GPU Systems 20 - From Nsight to Triton to FlashAttention

Closing the GPU Systems series by connecting profiling, Triton experimentation, and FlashAttention-style thinking

Lectures

Fresh

March 7, 2026 undefined min read