This video talks about why the transformer models are successful compared to its predecessors. It talks about various aspects of the transformer model such as self attention, positional encoding and parallelization that make transformers effective and efficient in many applications.