How do you design a system that reads a natural language question and retrieves the closest FAQ answer?

There are multiple approaches for FAQ based question answering

  1. Keyword based search (Information retrieval approach): Tag each question with keywords. Extract keywords from query and retrieve all relevant questions answers. Easy to scale with appropriate indexes reverse indexing.
  2. Lexical matching approach : word level overlap between query and question. These approaches might be harder to scale to do real time matching  based on the scale of the question-answer dataset.
  3. Embedding of the query and of each FAQ question and pick the closest matching FAQ question based on the embedding distance.
    1. Could use common technique such as word2vec/glove and average word level embeddings to get sentence embedding
    2. Can find phrasal, document level embeddings.
  4. Intent based retrieval : Understand the intent of the question and attributes of the intent – works well if there are a specific set of intents and the problem is to classify the query into one of the appropriate intents. Tag questions with appropriate intents and attributes to retrieve the appropriate answer.

Leave a Reply

Your email address will not be published. Required fields are marked *