undefined min read
GPU Systems 14 - Why Softmax Is Such a Good Kernel Exercise
How softmax combines reductions, memory traffic, and numerical stability in one kernel
How softmax combines reductions, memory traffic, and numerical stability in one kernel