n-grams is a term used for a sequence of n consecutive words/tokens/grams.
In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task.
While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the independence assumption made in language modeling. Language model can be written as
![]()
![]()
If we build language model without any order of Markov assumption, above model will have large number of parameters.
For n-gram model, probability of a word depends only on previous n-1 words, i.e.
![Rendered by QuickLaTeX.com \[p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1},w_{i-1},...,w_{i-n+1})\]](https://machinelearninginterview.com/wp-content/ql-cache/quicklatex.com-f87c234d972275eff9db21f85e4db76f_l3.png)
This dependency only on previous n-1 words and not the entire sequence comes with the
order Markov assumption. For bi-gram model, it is
order Markov assumption. One can add “start tokens” to deal with words like ![]()