What is the significance of n-grams in a language model ?

n-grams is a term used for a sequence of n consecutive words/tokens/grams.

In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task.

While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the independence assumption made in language modeling. Language model can be written as

$p(w)\,=\,p(w_{1},w_{2},...,w_{k})$

$p(w)\,=\,p(w_{k}|w_{k-1},w_{k- 2},...w_{1})*p(w_{k-1}|w_{k-2},...w_{1})*...*p(w_{1})$

If we build language model without any order of Markov assumption, above model will have large number of parameters.

For n-gram model, probability of a word depends only on previous n-1 words, i.e.

$p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1},w_{i-1},...,w_{i-n+1})$

This dependency only on previous n-1 words and not the entire sequence comes with the $n^{th}$ order Markov assumption. For bi-gram model, it is $2^{nd}$ order Markov assumption. One can add “start tokens” to deal with words like $w_{0},w_{-1},...,w_{2-n}$

Leave a Reply Cancel reply