Multi Head Attention : Transformer Architecture

The Transformer Series: This video explains the multi-head attention of the transformer architecture. 
In the last few posts we looked at what Skip or Residul connection are and the scaled dot product attention.  
Multi head attention is yet another important building block of transformer architecture that enableas the model to pay attention to different aspects of the input to capture a richer representation of the input data. Check out the video to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *