Say you’ve generated a language model using Bag of Words (BoW) with 1-hot encoding , and your training set has lot of sentences with the word “good” but none with the word “great”. Suppose I see sentence “Have a great day” p(great)=0.0 using this language model. How can you solve this problem leveraging the fact that good and great are similar words?

Posted on February 17, 2019March 7, 2019 by MLNerds

BoW with 1-hot encoding doesn’t capture the meaning of sentences, it only captures co-occurrence statistics. We need to build the language model using features which are representative of the meaning of the words.
A simple solution could be to cluster the word embeddings and group synonyms into a unique token. Alternately, when a word has zero probability, try to look for the probability of a synonym instead.
A more principled approach is to Build a language model using Distributed representations like probabilistic neural language model. (https://papers.nips.cc/paper/1839-a-neural-probabilistic-language-model.pdf)
Other workarounds for the zero probability problem involve various kinds of smoothing, though they do not leverage the semantic closeness of similar words.