What order of Markov assumption does n-grams model make ?

An n-grams model makes order n-1 Markov assumption. This assumption implies: given the previous n-1 words, probability of n^{th} word is independent of words prior to n-1 words.

Suppose we have k words in a sentence, their joint probability can be expressed as follows using chain rule: 

    \[\,p(w_{1},w_{2},...,w_{k})=\,p(w_{k}|w_{k-1},w_{k- 2},...w_{1})*p(w_{k-1}|w_{k-2},...w_{1})*...*p(w_{1})\]

Now, the Markov assumption can be used to make the above factorization simpler, where each word in a sequence depends only on the previous n-1 words for an n grams model.

For bi-gram model(n=2), first order Markov assumption is made and the above expression becomes 

    \[p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1})\]

For tri-gram model(n=3), second order Markov assumption is made, which means probability of a word depends on previous 2 words, hence second order.

    \[p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1},w_{i-2})\]

Thinking exercise – how do you handle words like w_{0}?

Leave a Reply

Your email address will not be published. Required fields are marked *