Jae's Tech Blog
Home Archive About Game

Posts tagged "matmul"

January 27, 2026 undefined min read

Distributed LLM Training 08 - Tensor Parallel Basics: Splitting Computation Inside the Model

Once the model itself is too large for one device, data parallelism is no longer enough and layer-internal computation has to be split

Lectures
Read more
February 15, 2026 undefined min read

GPU Systems 09 - Why Naive Matrix Multiplication Is Slow

Using naive matrix multiplication to see memory reuse and traffic problems clearly

Lectures
Read more
February 17, 2026 undefined min read

GPU Systems 10 - Tiled Matrix Multiplication and Shared Memory

Why tiled matrix multiplication and shared memory create such a big performance difference

Lectures
Read more
March 5, 2026 undefined min read

GPU Systems 18 - Tensor Cores and Mixed Precision

How tensor cores change performance in compute-heavy kernels and why mixed precision matters

Lectures
Read more

© 2025 Jae ยท Notes on systems, software, and building things carefully.

RSS