Scaled Dot Product Attention Posted on March 27, 2024 by MLNerds This video explains the motivation behind scaled dot product attention used in the transformer architecture and how it is computed.