How to measure the performance of the language model ?

Posted on February 16, 2019February 21, 2019 by MLNerds

While building language model, we try to estimate the probability of the sentence or a document.
Given sequences(sentences or documents) like
$w\,=(w_{1},\,w_{2},...,w_{n})$
Language model(bigram language model) will be :
$p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1})$

for each sequence $w$ given by above equation.
Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term $P(w)$ .
Perplexity is the inverse of square root of likelihood. So lesser the perplexity, better is the model. Note that square root in case of bigram language model, $n^{th}$ root in case of n-gram model. For more explanation on perplexity, visit this question.

Leave a Reply Cancel reply