You are building a natural language search box for a website. How do you accommodate spelling errors?

If you have a dictionary of words, edit distance is the simplest way of incorporating this. However, sometimes corrections based on context make sense. For instance, suppose I type “bed color shoes” – these are perfect dictionary words, but a sensible model would come up with “red color shoes”. Using the language model to come up with a higher probability sentence helps. One simple way of implementing this is  to get common query phrases and check the ones with closes edit distance and check if any of them have a better probability with a language model and suggest them.

Better still, for a more generic solution, you could try to train a deep learning character level RNN where you give phrases with one letter flipped and you could try learning a model to correct small spelling errors in a phrase. (does this make sense ?)

 

Leave a Reply

Your email address will not be published. Required fields are marked *