Multi Head Attention : Transformer Architecture Posted on April 10, 2024 by MLNerds The Transformer Series: This video explains the multi-head attention of the transformer architecture. In the last few posts we looked at what Skip or Residul connection are and the scaled dot product attention. Multi head attention is yet another important building block of transformer architecture that enableas the model to pay attention to different aspects of the input to capture a richer representation of the input data. Check out the video to learn more.