What is PMI ?

PMI : Pointwise Mutual Information, is a measure of correlation between two events x and y.  

    \[PMI(x,y)\,=\,log(\frac{p(x,y)}{p(x)p(y)})\]

    \[log(\frac{p(x,y)}{p(x)p(y)})\,=\,log(\frac{number\,of\,times\,both\,occur\,together}{(number\,of\,times\,x\,occur)*(number\,of\,times\,y\,occur)})\]

As you can see from above expression, PMI is directly proportional to the number of times both events occur together and inversely proportional to the individual counts which are in the denominator. This expression ensures high frequency words such as stop-words are penalised.

pPMI: pPMI is the positive PMI. PMI is often used as a feature value within a word vector obtained through distributional semantics. Since PMI can take any real value, for natural language tasks, it is common to compute pPMI as : 

    \[pPMI\,=\,max(0, PMI)\]

Leave a Reply

Your email address will not be published. Required fields are marked *