Natural Language Processing Archives - Page 2 of 6 - Ace the Data Science Interview!

How do you train a hMM model in practice ?

Posted on February 16, 2019March 7, 2019 by MLNerds

The joint probability distribution for the HMM model is given by the following equation where are the observed data points and the corresponding latent states: Before proceeding to answer the question on training a HMM, it makes sense to ask following questions What is the problem in hand for which we are training…

What are the different independence assumptions in hMM & Naive Bayes ?

Posted on February 16, 2019March 7, 2019 by MLNerds

Both the hMM and Naive Bayes have conditional independence assumption. hMM can be expressed by the equation below : Second equation implies a conditional independence assumption: Given the state observed variable is conditionally independent of previous observed variables, i.e. and Naive Bayes Model is expressed as: is the feature…

How many parameters are there for an hMM model?

Posted on February 16, 2019February 20, 2019 by MLNerds

Let us calculate the number of parameters for bi-gram hMM given as Let be the total number of states and be the vocabulary size and be the length of the sequence Before directly estimating the number of parameters, let us first try to see what are the different probabilities or rather probability matrix…

How do you generate text using a Hidden Markov Model (HMM) ?

Posted on February 16, 2019October 7, 2020 by MLNerds

The HMM is a latent variable model where the observed sequence of variables are assumed to be generated from a set of temporally connected latent variables . The joint distribution of the observed variables or data and the latent variables can be written as : One possible interpretation of the latent variables in…

What order of Markov assumption does n-grams model make ?

Posted on February 16, 2019March 8, 2019 by MLNerds

An n-grams model makes order n-1 Markov assumption. This assumption implies: given the previous n-1 words, probability of word is independent of words prior to words. Suppose we have k words in a sentence, their joint probability can be expressed as follows using chain rule: Now, the Markov assumption can be used to make…

How is long term dependency maintained while building a language model?

Posted on February 16, 2019March 8, 2019 by MLNerds

Language models can be built using the following popular methods – Using n-gram language model n-gram language models make assumption for the value of n. Larger the value of n, longer the dependency. One can refer to what is the significance of n-grams in a language model for further reading. Using hidden Markov Model(HMM) HMM maintains long…

What is the significance of n-grams in a language model ?

Posted on February 16, 2019February 16, 2019 by MLNerds

n-grams is a term used for a sequence of n consecutive words/tokens/grams. In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task. While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the…

Given a bigram language model, in what scenarios do we encounter zero probabilities? How Should we handle these situations ?

Posted on February 16, 2019March 8, 2019 by MLNerds

Recall the Bi-gram model can be expressed as : Scenario 1 – Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved by replacing OOV words by UNK tokens in both…

Why is smoothing applied in language model ?

Posted on February 16, 2019March 8, 2019 by MLNerds

Because there might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is and you need to find the probability of a sequence like where <START> is the token applied at the beginning of the document. Then…

How to measure the performance of the language model ?

Posted on February 16, 2019February 21, 2019 by MLNerds

While building language model, we try to estimate the probability of the sentence or a document. Given sequences(sentences or documents) like Language model(bigram language model) will be : for each sequence given by above equation. Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term . Perplexity…

← Newer posts Older posts →