- Auto Suggestion feature involves recommending the next word in a sentence or a phrase. For this, we need to build a language model on large enough corpus of “relevant” data.
- There are 2 caveats here –
- large corpus because we need to cover almost every case. This is important for recall.
- relevant data is useful for higher precision. A language model learnt on movie reviews may not be useful for an application like business mail. This is because business mail might have discussions on business related topics and might use formal language while movie reviews might be mostly written in natural and informal language.
- The data could be from google search queries or a user’s own chat. The language model could be built using probabilistic models (ngrams/HMM) or neural language models.
How will you build an auto suggestion feature for a messaging app or google search?
Posted on