What is the significance of n-grams in a language model ?

n-grams is a term used for a sequence of n consecutive words/tokens/grams.

In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task.

While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the independence assumption made in language modeling. Language model can be written as             

    \[p(w)\,=\,p(w_{1},w_{2},...,w_{k})\]

    \[p(w)\,=\,p(w_{k}|w_{k-1},w_{k- 2},...w_{1})*p(w_{k-1}|w_{k-2},...w_{1})*...*p(w_{1})\]

If we build language model without any order of Markov assumption, above model will have large number of parameters.

For n-gram model, probability of a word depends only on previous n-1 words, i.e. 

    \[p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1},w_{i-1},...,w_{i-n+1})\]

This dependency only on previous n-1 words and not the entire sequence comes with the n^{th} order Markov assumption. For bi-gram model, it is 2^{nd} order Markov assumption. One can add “start tokens” to deal with words like w_{0},w_{-1},...,w_{2-n}

Leave a Reply

Your email address will not be published. Required fields are marked *