Jae's Tech Blog
Home Archive About Game

Lectures

All posts in the Lectures

February 1, 2026 undefined min read

PyTorch Internals 10 - Connecting a Custom CUDA Kernel Through an Extension

A CUDA kernel becomes a real PyTorch operator only when tensor contracts, runtime semantics, and integration details are handled correctly

Lectures
Read more
February 4, 2026 undefined min read

PyTorch Internals 11 - Operator Schema, Dispatch Keys, and Meta Functions

A custom operator is not complete until its schema, dispatch behavior, and meta-level shape logic are defined clearly

Lectures
Read more
February 7, 2026 undefined min read

PyTorch Internals 12 - Backward Implementation Patterns and Saved-State Strategy

Backward design is really a question about what to save, what to recompute, and how to preserve correct semantics

Lectures
Read more
February 10, 2026 undefined min read

PyTorch Internals 13 - When a Fused Operator Is Actually Worth It

Fusion is valuable when it reduces memory traffic and intermediate materialization, not just when it reduces the number of visible ops

Lectures
Read more
February 13, 2026 undefined min read

PyTorch Internals 14 - AMP, Autocast, and Numerical Stability

A production-quality custom operator has to behave correctly under mixed precision, not just benchmark well in isolation

Lectures
Read more
February 16, 2026 undefined min read

PyTorch Internals 15 - Reading Operator Bottlenecks with PyTorch Profiling

The purpose of internals knowledge is to make a performance trace interpretable enough that you can actually change it

Lectures
Read more
February 19, 2026 undefined min read

PyTorch Internals 16 - The Big Picture of FX, torch.compile, and Inductor

PyTorch is no longer only an eager framework; compiler paths are now an important part of its optimization story

Lectures
Read more
February 22, 2026 undefined min read

PyTorch Internals 17 - What Role Triton Plays Inside the PyTorch Ecosystem

Triton is not just a convenient kernel language; it is part of the modern PyTorch kernel and compilation story

Lectures
Read more
February 25, 2026 undefined min read

PyTorch Internals 18 - Where Autograd Meets Distributed Runtime

DDP and FSDP are not external magic; they depend directly on autograd timing and tensor-state management inside the runtime

Lectures
Read more
โ† Previous
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Next โ†’

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS