A language model is a probability distribution over sequences of words P(w_1,… ,w_m). It enables us to measure the relative likelihood of different phrases. Measuring the likelihood of a sequence of words is useful in many NLP tasks such as speech recognition, machine translation, POS tagging, parsing, and so on.
Example : In any generative model where a target sequence is generated from a source sequence, like for instance machine translation
Target seq = Argmax over target_seq p(target_seq | source seq)
= Argmax overe target_seq p(target_seq) p(source_seq | target_seq)
p(target_seq) is typically the language model while p(source_seq | target_seq) depends on the specific statistical model used for machine translation.
Another Example : For speech recognition, the task involves converting a sequence of sounds into word sequences. The language model enables distinguishing between target_sequence phrases that sound similar based on relative likelihood of the phrase occuring. Example: I am eating an ice cream is more likely than I am eating and I scream during the speech recognition task.
Common language models involve n-gram model where each word depends on previous n words (unigram or bag of words, bi-gram, tri-gram), HMM based models and neural language models.