Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?


  1. As the vocabulary size (V) is large, these vectors will be large in size.
  2. They will be sparse as a word may not have co-occurred with all possible words.


  1. Dimensionality Reduction using approaches like
    1. Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation.
    2. Other Matrix factorisation techniques can be employed for dimensionality reduction.

Possible followup question : What is the information lost in approximating a  V dimensional word representation with a K dimensional representation. Answer: SVD finds the best possible K dimensional approximation of the term-document matrix from a information theoretic perspective.

Leave a Reply

Your email address will not be published. Required fields are marked *