What is PMI ?

PMI : Pointwise Mutual Information, is a measure of correlation between two events x and y.           As you can see from above expression, is directly proportional to the number of times both events occur together and inversely proportional to the individual counts which are in the denominator. This expression ensures high…

What is the complexity of Viterbi algorithm ?

Viterbi algorithm is a dynamic programming approach to find the most probable sequence of hidden states given the observed data, as modeled  by a HMM.  Without dynamic programming, it becomes an exponential problem as there are exponential number of possible sequences for a given observation(How – explained in answer below). Let the transition probabilities(state transition)…

Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?

Assume there are total states and let be the length of the largest sequence. Think how we generate text using an hMM. We first have a state sequence and from each state we emit an output. From each state, any word out of possible outcomes can be generated. Since there are states, at each possible…

How do you train a hMM model in practice ?

The joint probability distribution for the HMM model is given by the following equation where are the observed data points and the corresponding latent states:     Before proceeding to answer the question on training a HMM, it makes sense to ask following questions What is the problem in hand for which we are training…

What are the different independence assumptions in hMM & Naive Bayes ?

Both the hMM and Naive Bayes have conditional independence assumption. hMM can be expressed by the equation below :         Second equation implies a conditional independence assumption: Given the state observed variable is conditionally independent of previous observed variables, i.e. and Naive Bayes Model is expressed as:     is the feature…

How to measure the performance of the language model ?

While building language model, we try to estimate the probability of the sentence or a document. Given sequences(sentences or documents) like     Language model(bigram language model) will be :     for each sequence given by above equation. Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term . Perplexity…