Jae's Tech Blog

January 5, 2026 undefined min read

PyTorch Internals 01 - Why You Need to Understand the Internals

To reason well about performance, custom operators, and distributed runtime behavior, PyTorch has to be understood as a runtime, not just a Python library

Lectures

January 8, 2026 undefined min read

PyTorch Internals 02 - Tensors Run on Top of Storage, Size, and Stride

If you think of a tensor only as an n-dimensional array, you will misunderstand views, layouts, and hidden copies

Lectures

January 11, 2026 undefined min read

PyTorch Internals 03 - Contiguous Layout, Memory Format, and Hidden Copies

Layout affects both operator selection and performance, and sometimes the most expensive thing in a path is an invisible copy

Lectures

January 14, 2026 undefined min read

PyTorch Internals 04 - What the Dispatcher and Operator Registry Actually Do

A single operator name in PyTorch may map to many implementations, and the dispatcher is the runtime layer that decides which one runs

Lectures

January 17, 2026 undefined min read

PyTorch Internals 05 - How the Autograd Graph and Engine Work

Autograd is not just automatic differentiation; it is a graph-construction and backward-execution runtime

Lectures

January 20, 2026 undefined min read

PyTorch Internals 06 - When and How to Use a Custom Autograd Function

Custom autograd functions are a practical place to define forward-backward contracts before dropping to lower-level extensions

Lectures

January 23, 2026 undefined min read

PyTorch Internals 07 - Tensor Lifetime, the CUDA Caching Allocator, and Memory Reuse

PyTorch GPU memory behavior is shaped by a caching allocator, so observed memory usage is not just a story about current tensor objects

Lectures

January 26, 2026 undefined min read

PyTorch Internals 08 - CUDA Streams, Events, and Asynchronous Execution

Many PyTorch CUDA operations are asynchronous, so timing, synchronization, and dependency need to be reasoned about explicitly

Lectures

January 29, 2026 undefined min read

PyTorch Internals 09 - The Basic Path of a C++ Extension

A C++ extension is the first practical bridge between user-defined logic and the PyTorch runtime

Lectures