bi-grams – Machine Learning Interviews

How is long term dependency maintained while building a language model?

Posted on February 16, 2019March 8, 2019 by MLNerds

Language models can be built using the following popular methods – Using n-gram language model n-gram language models make assumption for the value of n. Larger the value of n, longer the dependency. One can refer to what is the significance of n-grams in a language model for further reading. Using hidden Markov Model(HMM) HMM maintains long…

Given a bigram language model, in what scenarios do we encounter zero probabilities? How Should we handle these situations ?

Posted on February 16, 2019March 8, 2019 by MLNerds

Recall the Bi-gram model can be expressed as : Scenario 1 – Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved by replacing OOV words by UNK tokens in both…

Why is smoothing applied in language model ?

Posted on February 16, 2019March 8, 2019 by MLNerds

Because there might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is and you need to find the probability of a sequence like where <START> is the token applied at the beginning of the document. Then…

Why are bigrams or any n-grams important in NLP(task like sentiment classification or spam detection) or important enough to find them explicitly?

Posted on February 9, 2019March 8, 2019 by MLNerds

There are mainly 2 reasons Some pair of words always occur together more often than they occur individually. Hence it is important to treat such co-occurring words as a single entity or a single token in training. For named entity recognition problem, Tokens such as “United States”, “North America”, “Red Wine” would make sense when…