- A 100 gram model will be more complex and will have lot of parameters.
- One way is to start with n-gram model with different values of n from 2 to 10 worst case.
- After some value of n, say n=7, the accuracy of the model becomes almost stagnant.
- One reason for this could be that as the sentence length increases, the long term dependency of a word on another word which is far from it might be lost in general also. Hence it makes sense to keep n upto some value like 7 to 10.
If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
Posted on