Given a bigram language model, in what scenarios do we encounter zero probabilities? How Should we handle these situations ?

Recall the Bi-gram model can be expressed as :     Scenario 1 – Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved by replacing OOV words by UNK tokens in both…

Why is smoothing applied in language model ?

Because there might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is      and you need to find the probability of a sequence like         where <START> is the token applied at the beginning of the document. Then…