What order of Markov assumption does n-grams model make ?

An n-grams model makes order n-1 Markov assumption. This assumption implies: given the previous n-1 words, probability of  word is independent of words prior to words. Suppose we have k words in a sentence, their joint probability can be expressed as follows using chain rule:      Now, the Markov assumption can be used to make…

How is long term dependency maintained while building a language model?

Language models can be built using the following popular methods – Using n-gram language model n-gram language models make assumption for the value of n. Larger the value of n, longer the dependency. One can refer to what is the significance of n-grams in a language model for further reading. Using hidden Markov Model(HMM) HMM maintains long…

What is the significance of n-grams in a language model ?

n-grams is a term used for a sequence of n consecutive words/tokens/grams. In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task. While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the…

Given a bigram language model, in what scenarios do we encounter zero probabilities? How Should we handle these situations ?

Recall the Bi-gram model can be expressed as :     Scenario 1 – Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved by replacing OOV words by UNK tokens in both…

Why is smoothing applied in language model ?

Because there might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is      and you need to find the probability of a sequence like         where <START> is the token applied at the beginning of the document. Then…

How to measure the performance of the language model ?

While building language model, we try to estimate the probability of the sentence or a document. Given sequences(sentences or documents) like     Language model(bigram language model) will be :     for each sequence given by above equation. Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term . Perplexity…

What is the difference between stemming and lemmatisation?

Stemming is about replacing each word with its origin stem word in order to remove the suffixes like “es”, “ies”, “s”. For ex., “cats” => “cat”, “computers” => “computer” etc. This is more of a heuristic approach and not using any grammar or dictionary. Lemmatisation has the same purpose as above but doing it properly…

You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?

This is typically called topic modeling. Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. For instance, two statements –  about meals and about food can probably characterized by the same topic though they do not necessarily use the same vocabulary. Topic models typically…