Jae's Tech Blog

January 26, 2026 undefined min read

PyTorch Internals 08 - CUDA Streams, Events, and Asynchronous Execution

Many PyTorch CUDA operations are asynchronous, so timing, synchronization, and dependency need to be reasoned about explicitly

Lectures

February 16, 2026 undefined min read

The purpose of internals knowledge is to make a performance trace interpretable enough that you can actually change it

Lectures