Category: Deep Learning
Scaled Dot Product Attention
Skip or Residual Connections in Deep Networks
GPT Model
The BERT Score – Evaluating Text Generation
This video talks about the evaluation metric BERTScore, why it needed over existing metrics such as the BLEU score and so on and how it is computed and evaluated. Traditional metrics look at exact text match. BERTScore looks at semantic similarity leveraging contextual word embeddings of words in the candidate and the reference sentences.
BERT Model
Batch vs Mini-Batch vs Stochastic Gradient Descent
Normalization in Deep Neural Networks
Batch norm and Layer norm are common normalization techniques. This brief video talks about the need for normalization and the types of norms in deep neural networks.
When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?
Deep learning algorithms are capable of learning arbitrarily complex non-linear functions by using a deep enough and a wide enough network with the appropriate non-linear activation function. Traditional ML algorithms often require feature engineering of finding the subset of meaningful features to use. Deep learning algorithms often avoid the need for the feature engineering step….
Why do you typically see overflow and underflow when implementing an ML algorithms ?
A common pre-processing step is to normalize/rescale inputs so that they are not too high or low. However, even on normalized inputs, overflows and underflows can occur: Underflow: Joint probability distribution often involves multiplying small individual probabilities. Many probabilistic algorithms involve multiplying probabilities of individual data points that leads to underflow. Example : Suppose you…