Jae's Tech Blog

Technical notes by Jae

Systems writing for engineers who want the deeper model.

Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.

Start here Featured series

A few strong entry points if you are new here.

Browse recent posts 119 posts

Fresh writing, updates, and ongoing series entries.

21 posts 58 min total

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.

Recommended Start

10 posts 43 min total

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

ML engineers, data scientists, and backend engineers moving from model experiments to production operations.

Recommended Start

12 posts 60 min total

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

Readers who want both the theory behind language processing and the bridge to real compiler construction.

Recommended Start

UPDATED

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

21 posts

Open series guide →

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

10 posts

Open series guide →

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

12 posts

Open series guide →

Linux Internals

Understanding the Linux kernel from processes and memory to containers

10 posts

Open series guide →

Computer Architecture

From CPU internals and privilege levels to memory hierarchy and modern multicore processors

10 posts

Open series guide →

Platform Engineering Fundamentals

Understanding the principles behind Internal Developer Platforms, golden paths, and developer self-service

11 posts

Open series guide →

Distributed LLM Training

From data parallelism to tensor parallelism, FSDP, ZeRO, and modern LLM training frameworks

20 posts

Open series guide →

PyTorch Internals

Understanding tensors, autograd, and CUDA extensions well enough to connect custom kernels to real training code

20 posts

Open series guide →

Python Lecture Series

Comprehensive Python programming course from basics to advanced topics

5 posts

Open series guide →

January 28, 2026 undefined min read

Linux Internals 05 - File Systems

How VFS, inodes, and ext4 work in the world where everything is a file

Lectures

January 27, 2026 undefined min read

Automata and Compilers 07 - Top-Down Parsing

The principles behind recursive descent parsers, LL(1) grammars, and the strengths and limitations of top-down parsing

Lectures

January 27, 2026 undefined min read

Distributed LLM Training 08 - Tensor Parallel Basics: Splitting Computation Inside the Model

Once the model itself is too large for one device, data parallelism is no longer enough and layer-internal computation has to be split

Lectures

January 26, 2026 undefined min read

PyTorch Internals 08 - CUDA Streams, Events, and Asynchronous Execution

Many PyTorch CUDA operations are asynchronous, so timing, synchronization, and dependency need to be reasoned about explicitly

Lectures

January 25, 2026 undefined min read

Computer Architecture 05 - CPU Privilege Levels and Protection

Why CPUs distinguish privilege levels, and how x86 protection rings and ARM exception levels protect the system

Lectures

January 24, 2026 undefined min read

Distributed LLM Training 07 - NCCL and Topology: Why the Same GPU Count Can Behave Very Differently

In distributed training, performance is often shaped more by how GPUs are connected than by the raw number of GPUs

Lectures

January 24, 2026 undefined min read

MLOps 04 - Model Versioning and Registry

Why model versioning differs from code versioning, and the role of a model registry

Lectures

January 23, 2026 undefined min read

PyTorch Internals 07 - Tensor Lifetime, the CUDA Caching Allocator, and Memory Reuse

PyTorch GPU memory behavior is shaped by a caching allocator, so observed memory usage is not just a story about current tensor objects

Lectures

January 22, 2026 undefined min read

Platform Engineering 05 - Infrastructure as Code for Platforms

Why IaC is the backbone of any platform, and how tools like Terraform, Pulumi, and Crossplane compare when building self-service infrastructure

Lectures