If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?

  1. A 100 gram model will be more complex and will have lot of parameters.
  2. One way is to start with n-gram model with different values of n from 2 to 10 worst case.
  3. After some value of n, say n=7, the accuracy of the model becomes almost stagnant.
  4. One reason for this could be that as the sentence length increases, the long term dependency of a word on another word which is far from it might be lost in general also. Hence it makes sense to keep n upto some value like 7 to 10.

Leave a Reply

Your email address will not be published. Required fields are marked *