What is a language model ? How do you create one ? Why do you need one ?

A language model is a probability distribution over sequences of words P(w_1,… ,w_m). It enables us to measure the relative likelihood of different phrases. Measuring the likelihood of a sequence of words is useful  in many NLP tasks such as speech recognition, machine translation, POS tagging, parsing, and so on.

Example :  In any generative model where a target sequence is generated from  a source sequence, like for instance machine translation

        Target seq = Argmax over target_seq  p(target_seq | source seq)

                            = Argmax overe target_seq p(target_seq) p(source_seq | target_seq)

p(target_seq) is typically the language model while p(source_seq | target_seq) depends on the specific statistical model used for machine translation.

Another Example : For speech recognition, the task involves converting a sequence of sounds into word sequences. The language model enables distinguishing between target_sequence phrases that sound similar based on relative likelihood of the phrase occuring. Example: I am eating an ice cream is more likely than I am eating and I scream during the speech recognition task.

Common language models involve n-gram model where each word depends on previous n words (unigram or bag of words, bi-gram, tri-gram), HMM based models and neural language models.

 

Leave a Reply

Your email address will not be published. Required fields are marked *