Jae's Tech Blog

Technical notes by Jae

Systems writing for engineers who want the deeper model.

Long-form posts on platform engineering, Linux, compilers, MLOps, and computer architecture, written to help you build stronger intuition instead of just memorize terms.

Start here Featured series

A few strong entry points if you are new here.

Browse recent posts 119 posts

Fresh writing, updates, and ongoing series entries.

21 posts 58 min total

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

Engineers who want to understand how GPUs actually execute work and eventually write and optimize their own kernels.

Recommended Start

10 posts 43 min total

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

ML engineers, data scientists, and backend engineers moving from model experiments to production operations.

Recommended Start

12 posts 60 min total

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

Readers who want both the theory behind language processing and the bridge to real compiler construction.

Recommended Start

UPDATED

GPU Systems

From GPU architecture and CUDA kernels to Triton and real kernel optimization work

21 posts

Open series guide →

MLOps Fundamentals

Building reliable ML systems from data pipelines to production monitoring

10 posts

Open series guide →

Automata and Compilers

From finite automata and formal languages to building a compiler from scratch

12 posts

Open series guide →

Linux Internals

Understanding the Linux kernel from processes and memory to containers

10 posts

Open series guide →

Computer Architecture

From CPU internals and privilege levels to memory hierarchy and modern multicore processors

10 posts

Open series guide →

Platform Engineering Fundamentals

Understanding the principles behind Internal Developer Platforms, golden paths, and developer self-service

11 posts

Open series guide →

Distributed LLM Training

From data parallelism to tensor parallelism, FSDP, ZeRO, and modern LLM training frameworks

20 posts

Open series guide →

PyTorch Internals

Understanding tensors, autograd, and CUDA extensions well enough to connect custom kernels to real training code

20 posts

Open series guide →

Python Lecture Series

Comprehensive Python programming course from basics to advanced topics

5 posts

Open series guide →

February 2, 2026 undefined min read

Distributed LLM Training 10 - Sequence Parallelism and the Cost of Long Context

As context length grows, activation memory and communication patterns change again, and sequence-oriented partitioning starts to matter

Lectures

February 1, 2026 undefined min read

GPU Systems 02 - The Thread, Warp, and Block Execution Model

What threads, warps, blocks, and grids mean in actual GPU execution

Lectures

February 1, 2026 undefined min read

PyTorch Internals 10 - Connecting a Custom CUDA Kernel Through an Extension

A CUDA kernel becomes a real PyTorch operator only when tensor contracts, runtime semantics, and integration details are handled correctly

Lectures

January 31, 2026 undefined min read

Platform Engineering 06 - Developer Portals and Service Catalogs

How developer portals like Backstage bring order to the chaos of scattered docs, tools, and tribal knowledge

Lectures

January 30, 2026 undefined min read

Distributed LLM Training 09 - Where Tensor Parallelism Actually Lives Inside a Transformer

Tensor parallelism becomes real when you map it onto QKV projections, attention output paths, and the two large MLP projections inside a transformer block

Lectures

January 30, 2026 undefined min read