PMI : Pointwise Mutual Information, is a measure of correlation between two events x and y. As you can see from above expression, is directly proportional to the number of times both events occur together and inversely proportional to the individual counts which are in the denominator. This expression ensures high…
Category: Machine Learning
What is the complexity of Viterbi algorithm ?
Viterbi algorithm is a dynamic programming approach to find the most probable sequence of hidden states given the observed data, as modeled by a HMM. Without dynamic programming, it becomes an exponential problem as there are exponential number of possible sequences for a given observation(How – explained in answer below). Let the transition probabilities(state transition)…
Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?
Assume there are total states and let be the length of the largest sequence. Think how we generate text using an hMM. We first have a state sequence and from each state we emit an output. From each state, any word out of possible outcomes can be generated. Since there are states, at each possible…
How do you train a hMM model in practice ?
The joint probability distribution for the HMM model is given by the following equation where are the observed data points and the corresponding latent states: Before proceeding to answer the question on training a HMM, it makes sense to ask following questions What is the problem in hand for which we are training…
What are the different independence assumptions in hMM & Naive Bayes ?
Both the hMM and Naive Bayes have conditional independence assumption. hMM can be expressed by the equation below : Second equation implies a conditional independence assumption: Given the state observed variable is conditionally independent of previous observed variables, i.e. and Naive Bayes Model is expressed as: is the feature…
How many parameters are there for an hMM model?
Let us calculate the number of parameters for bi-gram hMM given as Let be the total number of states and be the vocabulary size and be the length of the sequence Before directly estimating the number of parameters, let us first try to see what are the different probabilities or rather probability matrix…
How do you generate text using a Hidden Markov Model (HMM) ?
The HMM is a latent variable model where the observed sequence of variables are assumed to be generated from a set of temporally connected latent variables . The joint distribution of the observed variables or data and the latent variables can be written as : One possible interpretation of the latent variables in…
If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
A 100 gram model will be more complex and will have lot of parameters. One way is to start with n-gram model with different values of n from 2 to 10 worst case. After some value of n, say n=7, the accuracy of the model becomes almost stagnant. One reason for this could be that…
How to measure the performance of the language model ?
While building language model, we try to estimate the probability of the sentence or a document. Given sequences(sentences or documents) like Language model(bigram language model) will be : for each sequence given by above equation. Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term . Perplexity…
What would you care more about – precision or recall for spam filtering problem?
False positive means it was not a spam and we called it spam, false negative means it was a spam and we didn’t label it spam Precision = (TP / TP + FP) and Recall = (TP / (TP + FN)). Increasing precision involves decreasing FP and increasing recall means decreasing FN. We don’t want…