If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?

Posted on February 16, 2019February 16, 2019 by MLNerds

A 100 gram model will be more complex and will have lot of parameters.
One way is to start with n-gram model with different values of n from 2 to 10 worst case.
After some value of n, say n=7, the accuracy of the model becomes almost stagnant.
One reason for this could be that as the sentence length increases, the long term dependency of a word on another word which is far from it might be lost in general also. Hence it makes sense to keep n upto some value like 7 to 10.